Skip to content
This repository

Stacking with MultiIndex column DataFrame throws an error #3170

Closed
hayd opened this Issue · 3 comments

3 participants

Andy Hayden Chang She Wes McKinney
Andy Hayden
Collaborator
hayd commented

Stacking this particular MultiIndex column DataFrame throws an error:

import pandas as pd
from StringIO import StringIO

csv = StringIO("""ID,NAME,DATE,VAR1
1,a,03-JAN-2013,69
1,a,04-JAN-2013,77
1,a,05-JAN-2013,75
2,b,03-JAN-2013,69
2,b,04-JAN-2013,75
2,b,05-JAN-2013,72""")

df = pd.read_csv(csv, index_col=['DATE', 'ID'], parse_dates=['DATE'])
df.columns.name = 'Params'

In [11]: df.unstack('ID').resample('W-THU')
Out[11]: 
Params      VAR1      
ID             1     2
DATE                  
2013-01-03    69  69.0
2013-01-10    76  73.5

It looks like you ought to be able to stack this over 'ID', but it throws an error:

In [12]: df.unstack('ID').resample('W-THU').stack('ID')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-12-e61b475b7bed> in <module>()
----> 1 df.unstack('ID').resample('W-THU').stack('ID')

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in stack(self, level, dropna)
   3950             return result
   3951         else:
-> 3952             return stack(self, level, dropna=dropna)
   3953 
   3954     def unstack(self, level=-1):

/Library/Python/2.7/site-packages/pandas/core/reshape.pyc in stack(frame, level, dropna)
    443 
    444     if isinstance(frame.columns, MultiIndex):
--> 445         return _stack_multi_columns(frame, level=level, dropna=True)
    446     elif isinstance(frame.index, MultiIndex):
    447         new_levels = list(frame.index.levels)

/Library/Python/2.7/site-packages/pandas/core/reshape.pyc in _stack_multi_columns(frame, level, dropna)
    504         # can make more efficient?
    505         if loc.stop - loc.start != levsize:
--> 506             chunk = this.ix[:, this.columns[loc]]
    507             chunk.columns = level_vals.take(chunk.columns.labels[-1])
    508             value_slice = chunk.reindex(columns=level_vals).values

/Library/Python/2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
     32                 pass
     33 
---> 34             return self._getitem_tuple(key)
     35         else:
     36             return self._getitem_axis(key, axis=0)

/Library/Python/2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    222                 continue
    223 
--> 224             retval = retval.ix._getitem_axis(key, axis=i)
    225 
    226         return retval

/Library/Python/2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
    340                 raise ValueError('Cannot index with multidimensional key')
    341 
--> 342             return self._getitem_iterable(key, axis=axis)
    343         elif axis == 0:
    344             is_int_index = _is_integer_index(labels)

/Library/Python/2.7/site-packages/pandas/core/indexing.pyc in _getitem_iterable(self, key, axis)
    406             # this is not the most robust, but...
    407             if (isinstance(labels, MultiIndex) and
--> 408                     not isinstance(keyarr[0], tuple)):
    409                 level = 0
    410             else:

/Library/Python/2.7/site-packages/pandas/core/index.pyc in __getitem__(self, key)
   1745             retval = []
   1746             for lev, lab in zip(self.levels, self.labels):
-> 1747                 if lab[key] == -1:
   1748                     retval.append(np.nan)
   1749                 else:

IndexError: index out of bounds

Taken from this SO question.

Andy Hayden
Collaborator
hayd commented

FYI This has been answered on SO:

"The example unstacks a non-numerical column 'NAME' which is silently dropped but causes problems during re-stacking."

print df[['VAR1']].unstack('ID').resample('W-THU').stack('ID')
Params         VAR1
DATE       ID
2013-01-03 A   69.0
           B   69.0
2013-01-10 A   76.0
           B   73.5

Still seems strange that this is needed (?)

Chang She
Collaborator

Fixed in master

In [12]: df
Out[12]: 
Params        NAME  VAR1
DATE       ID           
2013-01-03 1     a    69
2013-01-04 1     a    77
2013-01-05 1     a    75
2013-01-03 2     b    69
2013-01-04 2     b    75
2013-01-05 2     b    72

In [13]: df.unstack('ID').resample('W-THU')
Out[13]: 
Params      VAR1      
ID             1     2
DATE                  
2013-01-03    69  69.0
2013-01-10    76  73.5
Andy Hayden
Collaborator
hayd commented

@changhiskhan I think this is still visible in master:

df.unstack('ID').resample('W-THU').stack('ID')
...
IndexError: index out of bounds

?

Wes McKinney wesm reopened this
Chang She changhiskhan referenced this issue from a commit in changhiskhan/pandas
Chang She BUG: stacking with MultiIndex column with some unused level uniques f…
…ails #3170
be58573
Andy Hayden hayd closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.