Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Groupby with as_index=False raises error when type is Category. #32599

Closed
amineKammah opened this issue Mar 10, 2020 · 1 comment · Fixed by #34767
Closed

[BUG] Groupby with as_index=False raises error when type is Category. #32599

amineKammah opened this issue Mar 10, 2020 · 1 comment · Fixed by #34767
Labels
Bug Categorical Categorical Data Type Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@amineKammah
Copy link

amineKammah commented Mar 10, 2020

Code to reproduce the issue

import pandas as pd

test = pd.DataFrame([[1, 1], [2, 2], [3, 3]], columns=['col1', 'col2'])
test['col1'] = test['col1'].astype('category')

test.groupby(['col1', 'col2'], as_index=False).size()

Problem description

With pandas 1.0.1, the code throws an error ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>.
With pandas 0.25.3, the code works, but as_index argument do not function as already mentioned in #25011.
This happened with categorical type, the output of the new version is similar to 0.25.3 with other types.

Expected Output

col1 col2 0
0 1 1 1
1 1 2 0
2 1 3 0
3 2 1 0
4 2 2 1
5 2 3 0
6 3 1 0
7 3 2 0
8 3 3 1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Darwin
OS-release : 19.3.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Apr 24, 2020
@mroeschke mroeschke added Bug Categorical Categorical Data Type labels May 11, 2020
@simonjayhawkins
Copy link
Member

With pandas 1.0.1, the code throws an error

regression in #29690 (i.e. 1.0.0)

c5a1f9e is the first bad commit
commit c5a1f9e
Author: Oliver Hofkens oliver@novemberfive.co
Date: Wed Nov 20 13:46:18 2019 +0100

BUG: Series groupby does not include nan counts for all categorical labels (#17605) (#29690)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants