-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG:Pivot table drops column/index names=nan when dropna=false #16142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| if not fastpath: | ||
|
|
||
| # Categories cannot contain NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some unintentional changes in here? This shouldn't be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the Index, Index([None, u'A', u'B'], dtype='object'), needs to be passed to Categorical when doing MultiIndex, as when dropna=False, None could also be the index/column name. Or I didn't get this correctly?
| table = table.fillna(value=fill_value, downcast='infer') | ||
|
|
||
| if margins: | ||
| if dropna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this if dropna, most of the tests pass (including a fix for this one) other than
df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
'b': [3, 3, 4, 4, 4, 4]})
actual = pd.crosstab(df.a, df.b, margins=True, dropna=False)
expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 4, 6]])
expected.index = Index([1.0, 2.0, 'All'], name='a')
expected.columns = Index([3, 4, 'All'], name='b')Here's the result and expected
(Pdb) pp actual
b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
All 2 3 5
(Pdb) pp expected
b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
All 2 4 6
You have more experience with this section of the code than I do, but the margins on the expected look incorrect to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely right. I think it should be (if dropna=False):
| b | 3 | 4 | All |
|---|---|---|---|
| a | |||
| 1.0 | 1 | 0 | 1 |
| 2.0 | 1 | 3 | 4 |
| np.nan | 0 | 1 | 1 |
| All | 2 | 4 | 6 |
- I need to fix the np.nan as it is still being ignored even with the current fix when
dropna=False(i.e. the fix only works forNone) - Actually I didn't get why removing
dropnahere would help yet. I'll check closer.
|
this is doing similar things to changes in #12607 |
Ah I see. That is a much larger change that the original issue I was looking at :) |
|
I think the change in |
|
Some tests will be failing and many are actually different/separate problems. I already located several and will update soon. |
|
can you rebase and update? |
|
needs a rebase. if you'd like to continue, pls comment. |
git diff upstream/master --name-only -- '*.py' | flake8 --diffclean-up of Pivot table drops column/index names=nan when dropna=false #14246