Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy dictionary aggregation raises ValueError when 'as_index=False' #39103

Open
2 of 3 tasks
dchigarev opened this issue Jan 11, 2021 · 0 comments
Open
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform Bug Groupby

Comments

@dchigarev
Copy link

dchigarev commented Jan 11, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import numpy as np
import pandas

# Could be any data with MultiIndex columns
data = {(f"col{i}", f"level-{i % 2}"): np.arange(5) for i in range(5)}
df = pandas.DataFrame(data)

by = df.columns[0]
agg_col = df.columns[1]

agg_dict = {agg_col: ["min", "max"]}
df.groupby(by, as_index=False).agg(func=agg_dict)  # ValueError
Traceback
Traceback (most recent call last):
  File "../exp.py", line 12, in <module>
    df.groupby(by, as_index=False).agg(func=agg_dict)  # ValueError
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 990, in aggregate
    self._insert_inaxis_grouper_inplace(result)
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1614, in _insert_inaxis_grouper_inplace
    result.insert(0, name, lev)
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/frame.py", line 3760, in insert
    self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1197, in insert
    new_axis = self.items.insert(loc, item)
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 3801, in insert
    item = self._validate_fill_value(item)
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 3784, in _validate_fill_value
    raise ValueError("Item must have length equal to number of levels.")
ValueError: Item must have length equal to number of levels.

Problem description

When we specify an aggregation function via list, pandas adds a new level with function name to the columns of the resulted frame:

>>> df
   col0  col1
0     0     0
>>> df.groupby("col0").agg({"col1": ["max"]})
     col1
      max
col0
0       0

In the result, the number of levels in the index which represents grouper and columns is different, which causes some problems in _insert_inaxis_grouper_inplace. When we're trying to insert MultiIndex grouper inside the resulted frame, but their nlevels is different, pandas.DataFrame.insert raises a ValueError because of it.

Filling mismatched levels with an empty string (like pandas already does when inserted value don't have MultiIndex columns) will probably solve this issue

>>> df.groupby("col0", as_index=False).agg({"col1": ["max"]}).columns
MultiIndex([('col0',    ''),
            ('col1', 'max')],
           )

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 3e89b4c4b1580aa890023fc550774e63d499da25
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-50-generic
Version          : #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.0
numpy            : 1.19.0
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.3.1.post20200622
Cython           : None
pytest           : 6.0.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : 0.4.1
xlsxwriter       : None
lxml.etree       : 4.5.1
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : None
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : None
fsspec           : 0.7.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : 0.13.2
pyarrow          : 1.0.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : None
xarray           : 0.15.1
xlrd             : 1.2.0
xlwt             : None
numba            : None
@dchigarev dchigarev added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2021
@simonjayhawkins simonjayhawkins added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 22, 2021
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jan 22, 2021
@mroeschke mroeschke added the Apply Apply, Aggregate, Transform label Aug 15, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Groupby
Projects
None yet
Development

No branches or pull requests

3 participants