Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: incorrect unstacking with NaNs in the index #7403

Closed
jorisvandenbossche opened this issue Jun 9, 2014 · 2 comments · Fixed by #9292
Closed

BUG: incorrect unstacking with NaNs in the index #7403

jorisvandenbossche opened this issue Jun 9, 2014 · 2 comments · Fixed by #9292
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Related to #7401, but another issue I think. Some more strange behaviour with NaNs in the index when unstacking (but now not specifically to datetime).

First case:

In [9]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ...:                    'B':range(8),
   ...:                    'C':range(8)})

In [10]: df.set_index(['A', 'B']).unstack(0)
Out[10]:
    C
A   a   b
B
0   0 NaN
1   1 NaN
2   2 NaN
3   3 NaN
4 NaN   4
5 NaN   5
6 NaN   6
7 NaN   7

In [11]: df.iloc[3,1] = np.NaN

In [12]: df.set_index(['A', 'B']).unstack(0)
Out[12]:
      C
A     a   b
B
 0    3 NaN
 1    0 NaN
 2    1 NaN
NaN NaN NaN
 4  NaN   2
 5  NaN   4
 6  NaN   5
 7    6   7

The values in the first column are totally mixed up.

Second case (with repeating values in the second level):

In [13]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ....:                    'B':range(4)*2,
   ....:                    'C':range(8)})
In [14]: df
Out[14]:
   A  B  C
0  a  0  0
1  a  1  1
2  a  2  2
3  a  3  3
4  b  0  4
5  b  1  5
6  b  2  6
7  b  3  7

In [15]: df.set_index(['A', 'B']).unstack(0)
Out[15]:
   C
A  a  b
B
0  0  4
1  1  5
2  2  6
3  3  7

In [16]: df.iloc[2,1] = np.NaN

In [17]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-2f4735e48b98> in <module>()
----> 1 df.set_index(['A', 'B']).unstack(0)

...

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in _make_selectors(
self)
    139
    140         if mask.sum() < len(self.index):
--> 141             raise ValueError('Index contains duplicate entries, '
    142                              'cannot reshape')
    143

ValueError: Index contains duplicate entries, cannot reshape

and another error message with the NaN on the last place (of the sublevel):

In [20]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ....:                    'B':range(4)*2,
   ....:                    'C':range(8)})
In [21]: df.iloc[3,1] = np.NaN

In [22]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)

...

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in get_result(self)

    173                 values_indexer = com._ensure_int64(l[~mask])
    174                 for i, j in enumerate(values_indexer):
--> 175                     values[j] = orig_values[i]
    176             else:
    177                 index = index.take(self.unique_groups)

IndexError: index 4 is out of bounds for axis 0 with size 4

I know NaNs in the index is not really recommended, but just exploring this (as I was caught by such an issue, you don't always think of looking if you have NaNs if you get such errors)

@jreback jreback added this to the 0.14.1 milestone Jun 9, 2014
@jreback
Copy link
Contributor

jreback commented Jun 9, 2014

ok...either fix or see if can put in place a better error message

@cpcloud cpcloud self-assigned this Jun 9, 2014
@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2014

this too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants