Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crosstabs doesn't work with margin and normalize together #27500

Closed
min2bro opened this issue Jul 21, 2019 · 2 comments · Fixed by #27663

Comments

@min2bro
Copy link

commented Jul 21, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

pd.crosstab([df.A,df.B],df.C,margins=True,margins_name='Sub-Total',normalize=0)

Problem description

pandas.Crosstab:
As per the Documentation, For parameter normalize If margins is True, will also normalize margin values. However when I give normalize as True and margins as True with margin_names as a string then it throws following exception:

KeyError: "['Sub-Total'] not found in axis"

where Sub-Total is the margins_name String.

Expected Output

Normalize value of the Margin rows and columns

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.2.1
pip: 18.0
setuptools: 39.1.0
Cython: 0.26.1
numpy: 1.15.4
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.7
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jul 21, 2019

I think the issue here is the combination of normalize across the rows with a MultiIndex. Note if you did the following this works:

pd.crosstab([df.A],df.C, margins=True, normalize=0)

But the MultiIndex you are creating doesn't

pd.crosstab([df.A, df.B],df.C, margins=True, normalize=0)

The source for that is here - if you'd like to investigate and submit a PR to patch would certainly be welcome:

if normalize is not False:

@charlesdong1991

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

I think i found the bug, i will submit PR tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.