Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy aggregation of DataFrame with MultiIndex columns breaks with custom function #31777

Closed
sbitzer opened this issue Feb 7, 2020 · 4 comments · Fixed by #32040
Closed
Assignees
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@sbitzer
Copy link

sbitzer commented Feb 7, 2020

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
    np.random.rand(10, 4),
    columns=pd.MultiIndex.from_product([[1, 2], [3, 4]]))
grp = df.groupby(np.r_[np.ones(5), np.zeros(5)])
grp.agg(lambda s: s.mean())

Problem description

The above call raises

ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements

because

result.columns = Index(
result.columns.levels[0], name=self._selected_obj.columns.name
)

assumes that the original columns were only Index. Doing

grp.agg('mean')

works as expected (result with MultiIndex columns).

Expected Output

That of

grp.agg('mean')

Output of pd.show_versions()

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.4
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@jorisvandenbossche jorisvandenbossche added Groupby Regression Functionality that used to work in a prior pandas version labels Feb 7, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 7, 2020
@jorisvandenbossche
Copy link
Member

@sbitzer Thanks for the report!

cc @jbrockmendel

@jorisvandenbossche
Copy link
Member

@jbrockmendel I think this one is caused by #28203. That PR moved the Index creation (the snippet that @sbitzer shows above) outside of the try/except block, and it is this step that is failing.

@jbrockmendel
Copy link
Member

OK. Is that incorrect index-creation call something we can check for any avoid, or does it need to be inside a try/except?

@MarcoGorelli
Copy link
Member

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
4 participants