Skip to content

Conversation

@OXPHOS
Copy link
Contributor

@OXPHOS OXPHOS commented Apr 26, 2017


if not fastpath:

# Categories cannot contain NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some unintentional changes in here? This shouldn't be removed.

Copy link
Contributor Author

@OXPHOS OXPHOS Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the Index, Index([None, u'A', u'B'], dtype='object'), needs to be passed to Categorical when doing MultiIndex, as when dropna=False, None could also be the index/column name. Or I didn't get this correctly?

table = table.fillna(value=fill_value, downcast='infer')

if margins:
if dropna:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this if dropna, most of the tests pass (including a fix for this one) other than

        df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
                           'b': [3, 3, 4, 4, 4, 4]})
        actual = pd.crosstab(df.a, df.b, margins=True, dropna=False)
        expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 4, 6]])
        expected.index = Index([1.0, 2.0, 'All'], name='a')
        expected.columns = Index([3, 4, 'All'], name='b')

Here's the result and expected

(Pdb) pp actual
b    3  4  All
a
1.0  1  0    1
2.0  1  3    4
All  2  3    5
(Pdb) pp expected
b    3  4  All
a
1.0  1  0    1
2.0  1  3    4
All  2  4    6

You have more experience with this section of the code than I do, but the margins on the expected look incorrect to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right. I think it should be (if dropna=False):

b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
np.nan 0 1 1
All 2 4 6
  • I need to fix the np.nan as it is still being ignored even with the current fix when dropna=False (i.e. the fix only works for None)
  • Actually I didn't get why removing dropna here would help yet. I'll check closer.

@jreback
Copy link
Contributor

jreback commented Apr 27, 2017

this is doing similar things to changes in #12607

@TomAugspurger
Copy link
Contributor

this is doing similar things to changes in #12607

Ah I see. That is a much larger change that the original issue I was looking at :)

@OXPHOS OXPHOS changed the title Fix 14072 pivot_table dropna BUG:Pivot table drops column/index names=nan when dropna=false Apr 27, 2017
@OXPHOS
Copy link
Contributor Author

OXPHOS commented Apr 27, 2017

I think the change in Cython is definitely required. The problem is how to pass dropna to it without disturbing too many existing structures.
I just reset my developing environment and am trying to use Anaconda Python2.7. Interestingly, numerous tests failed on me even with the master branch. So I just tested pivot and groupby at local. I'll do more research on the weekend.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 27, 2017
@OXPHOS OXPHOS force-pushed the pivot_table_dropna branch from 8a2fcb0 to 99c240c Compare May 1, 2017 06:46
@OXPHOS
Copy link
Contributor Author

OXPHOS commented May 1, 2017

Some tests will be failing and many are actually different/separate problems. I already located several and will update soon.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can you rebase and update?

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

needs a rebase. if you'd like to continue, pls comment.

@jreback jreback closed this Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pivot_table margins bottom-left total does not correspond to other content when dropna=False

3 participants