Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pivot with nans gives a seemingly wrong result #7466

Closed
cpcloud opened this issue Jun 14, 2014 · 4 comments · Fixed by #9061
Closed

BUG: pivot with nans gives a seemingly wrong result #7466

cpcloud opened this issue Jun 14, 2014 · 4 comments · Fixed by #9061
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2014

related to #3588

This test

def test_pivot_index_with_nan(self):
    # GH 3588
    nan = np.nan
    df = DataFrame({"a":['R1', 'R2', nan, 'R4'], 'b':["C1", "C2", "C3" , "C4"], "c":[10, 15, nan , 20]})
    result = df.pivot('a','b','c')
    expected = DataFrame([[nan,nan,nan,nan],[nan,10,nan,nan],
                            [nan,nan,nan,nan],[nan,nan,15,20]],
                            index = Index(['R1','R2',nan,'R4'],name='a'),
                            columns = Index(['C1','C2','C3','C4'],name='b'))
    tm.assert_frame_equal(result, expected)

seems very odd to me, even though the expected result is constructed by hand.

Here's df and result:

In [2]: df
Out[2]:
     a   b   c
0   R1  C1  10
1   R2  C2  15
2  NaN  C3 NaN
3   R4  C4  20

In [3]: result
Out[3]:
b    C1  C2  C3  C4
a
R1  NaN NaN NaN NaN
R2  NaN  10 NaN NaN
NaN NaN NaN NaN NaN
R4  NaN NaN  15  20

The way I understand pivot here is that it makes a DataFrame with a as the index, b as the columns and then uses the third argument, in this case c as values, where a and b form a kind of coordinate system for c. Thus, instead of the current output, I would expect the result to be

In [14]: e
Out[14]:
b    C1  C2  C3  C4
a
R1   10 NaN NaN NaN
R2  NaN  15 NaN NaN
NaN NaN NaN NaN NaN
R4  NaN NaN NaN  20

If, on the other hand, you don't have any nans in your frame, the result is what I would expect:

In [16]: df.loc[2, 'a'] = 'R3'

In [17]: df.loc[2, 'c'] = 17

In [18]: df
Out[18]:
    a   b   c
0  R1  C1  10
1  R2  C2  15
2  R3  C3  17
3  R4  C4  20

In [19]: df.pivot('a','b','c')
Out[19]:
b   C1  C2  C3  C4
a
R1  10 NaN NaN NaN
R2 NaN  15 NaN NaN
R3 NaN NaN  17 NaN
R4 NaN NaN NaN  20

I'll have a look at #3588 to see if this is a regression, or if I'm just misunderstanding how this is supposed to work.

I have a suspicion that this is related to #7403

@cpcloud cpcloud added this to the 0.14.1 milestone Jun 14, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Jun 15, 2014

@jreback could use a hand here....

i've figure out what the issue is, but not totally sure how to fix it

in _Unstacker._make_selectors() there's a comp_index (compressed index) created that is the integer indices of the remaining labels

BUT a selector is created that basically fills in the values where they should be as a result of the unstack op

this is done by

selector = self.sorted_labels[-1] + stride * comp_index

problem is that if the "remaining labels" have nan then comp_index will have -1 which will make selector have values like [-2, 0, 5, 11] which is wrong because that ultimately leads to the incorrect result above.

i tried just slicing out anything == -1 but that broke about 20 tests or so , which makes me think there's a better way ....

you don't need to go too deep but if you have any pointers they would be much appreciated
thanks!

@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

bump?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 26, 2014

@jreback i have some notes on this i can put them up later today at the latests this weekend

@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

sure...just going thru issuess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants