Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: unstacking with partial selection and mixed dtype gives "ValueError: shape of passed values ..." #7405

Closed
jorisvandenbossche opened this issue Jun 9, 2014 · 1 comment · Fixed by #9292
Labels
Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jorisvandenbossche
Copy link
Member

And yet another unstacking bug (but now not related to NaNs as in #7403). Getting a ValueError: Shape of passed values is (2, 3), indices imply (2, 5) (where 3 is the correct one, 5 is the original number of rows) in some specific conditions with mixed dtype and selection of the rows:

In [23]: df = pd.DataFrame({'A': ['a']*5,
   ....:                    'B':pd.date_range('2012-01-01', periods=5),
   ....:                    'C':np.zeros(5),
   ....:                    'D':np.zeros(5)})

In [25]: df = df.set_index(['A', 'B'])

In [26]: df
Out[26]:
              C  D
A B
a 2012-01-01  0  0
  2012-01-02  0  0
  2012-01-03  0  0
  2012-01-04  0  0
  2012-01-05  0  0

Unstacking this or a selection of it, works as expected

In [27]: df.unstack(0)
Out[27]:
            C  D
A           a  a
B
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0
2012-01-04  0  0
2012-01-05  0  0

In [28]: df.iloc[:3].unstack(0)
Out[28]:
            C  D
A           a  a
B
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0

But when the dataframe has mixed dtypes:

In [29]: df['D'] = df['D'].astype('int64')

In [31]: df.dtypes
Out[31]:
C    float64
D      int64
dtype: object

unstacking still does work, but not anymore on the selection:

In [32]: df.unstack(0)
Out[32]:
            C  D
A           a  a
B
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0
2012-01-04  0  0
2012-01-05  0  0

In [33]: df.iloc[:3].unstack(0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in _verify_integr
ity(self)
   2090         for block in self.blocks:
   2091             if not block.is_sparse and block.shape[1:] != mgr_shape[1:]:

-> 2092                 construction_error(tot_items, block.shape[1:], self.axes
)
   2093         if len(self.items) != tot_items:
   2094             raise AssertionError('Number of manager items must equal uni
on of '

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in construction_e
rror(tot_items, block_shape, axes, e)
   3162         raise e
   3163     raise ValueError("Shape of passed values is {0}, indices imply {1}".
format(
-> 3164         passed,implied))
   3165
   3166

ValueError: Shape of passed values is (2, 3), indices imply (2, 5)

If the index is resetted and setted again (so the levels and labels are recalculated based on the selection), it does work again:

In [34]: df.iloc[:3].reset_index().set_index(['A', 'B']).unstack(0)
Out[34]:
            C  D
A           a  a
B
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0

I am not sure about the exact circumstances this happens, because I can't reproduce it with a small example with different values in the A index level (now only a), but in the large real dataframe where I experienced it, there were multiple levels.

@jorisvandenbossche
Copy link
Member Author

@cpcloud another one :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
3 participants