Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: margin=True is affected by NaN in other columns in pivot_table #33466

Open
3 tasks done
nikhilpatwardhan opened this issue Apr 10, 2020 · 2 comments
Open
3 tasks done
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@nikhilpatwardhan
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> data = {'A': [6, 7, None], 'B': [None, 5, 8], 'name': ['Alice', 'Alice', 'Alice']}
>>> df = pd.DataFrame(data)
>>> df.pivot_table(index='name', values=['A', 'B'], aggfunc=['min', 'max'], margins=True)

Output:

       min       max     
         A    B    A    B
name                     
Alice  6.0  5.0  7.0  8.0
All    7.0  5.0  7.0  5.0

Problem description

The summary line 'All' shows min for A as 7.0 and max for B as 5.0 although min of A is 6.0 and max of B is 8.0

Expected Output

       min       max     
         A    B    A    B
name                     
Alice  6.0  5.0  7.0  8.0
All    6.0  5.0  7.0  8.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 6bc8a49
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 19.3.0
Version : Darwin Kernel Version 19.3.0: Thu Jan 9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1225.g6bc8a49f2
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200325
Cython : 0.29.16
pytest : 5.4.1
hypothesis : 5.8.0
sphinx : 3.0.1
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@nikhilpatwardhan nikhilpatwardhan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 10, 2020
@dsaxton dsaxton added Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 12, 2020
@dsaxton
Copy link
Member

dsaxton commented Apr 15, 2020

This looks to be the source of the bug (it's dropping rows with any missing data after calculating the pivot table but before getting the margins):

data = data[data.notna().all(axis=1)]

which was itself added to fix a bug in crosstab: #12614. So somehow this would have to be fixed without breaking crosstab.

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 31, 2021
@theavey
Copy link
Contributor

theavey commented Aug 11, 2022

Still exists in pandas 1.4.3

>>> df = pd.DataFrame({"a": [1, 2, None, 4], "b": ["a", "a", "a", "b"], "c": [1] * 4})
>>> df
	a	b	c
0	1.00	a	1
1	2.00	a	1
2	NaN	a	1
3	4.00	b	1
>>> df.pivot_table(index="b", values=["a", "c"], aggfunc="sum", margins=True)
	a	c
b		
a	3.00	3
b	4.00	1
All	7.00	3

Would instead expect

>>> df.pivot_table(index="b", values=["a", "c"], aggfunc="sum", margins=True)
	a	c
b		
a	3.00	3
b	4.00	1
All	7.00	4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants