Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crosstabs doesn't work with margin and normalize together #27500

min2bro opened this issue Jul 21, 2019 · 2 comments

crosstabs doesn't work with margin and normalize together #27500

min2bro opened this issue Jul 21, 2019 · 2 comments


Copy link

@min2bro min2bro commented Jul 21, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})


Problem description

As per the Documentation, For parameter normalize If margins is True, will also normalize margin values. However when I give normalize as True and margins as True with margin_names as a string then it throws following exception:

KeyError: "['Sub-Total'] not found in axis"

where Sub-Total is the margins_name String.

Expected Output

Normalize value of the Margin rows and columns

Output of pd.show_versions()


commit: None
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.2.1
pip: 18.0
setuptools: 39.1.0
Cython: 0.26.1
numpy: 1.15.4
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.7
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Copy link

@WillAyd WillAyd commented Jul 21, 2019

I think the issue here is the combination of normalize across the rows with a MultiIndex. Note if you did the following this works:

pd.crosstab([df.A],df.C, margins=True, normalize=0)

But the MultiIndex you are creating doesn't

pd.crosstab([df.A, df.B],df.C, margins=True, normalize=0)

The source for that is here - if you'd like to investigate and submit a PR to patch would certainly be welcome:

if normalize is not False:


Copy link

@charlesdong1991 charlesdong1991 commented Jul 30, 2019

I think i found the bug, i will submit PR tonight.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
4 participants