Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby with function returning a tuple fails if index has a name #18696

Open
achabotl opened this issue Dec 8, 2017 · 8 comments
Open

Groupby with function returning a tuple fails if index has a name #18696

achabotl opened this issue Dec 8, 2017 · 8 comments

Comments

@achabotl
Copy link

achabotl commented Dec 8, 2017

Code Sample

import pandas as pd
print(pd.__version__)
idx = pd.date_range('2017-12-08', periods=6, freq='10D')
df = pd.DataFrame(np.arange(6), index=idx)
df.index.name = 'name'
df.groupby(lambda x: (x.year, x.month)).mean()

Problem description

In 0.21.0 (and 0.20.x), calling groupy with a function that returns a tuple on a DataFrame with a named index fails with ValueError: Names should be list-like for a MultiIndex. It seems to be caused by the index having a name.

The sample works in 0.19.0, with the output below. The same code without df.index.name = 'name' works under 0.20 and 0.21.

Expected Output

            0
(2017, 12)  1
(2018, 1)   4

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 38.2.4
Cython: 0.27.3
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 9, 2017

I believe this is correct, even though it looks a bit weird. You are passing a callable

e.g

by multiple-indices

In [10]: df.groupby([df.index.year, df.index.month]).mean()
Out[10]: 
           0
name name   
2017 12    1
2018 1     4

by an index

In [14]: df.groupby(df.index).mean()
Out[14]: 
            0
name         
2017-12-08  0
2017-12-18  1
2017-12-28  2
2018-01-07  3
2018-01-17  4
2018-01-27  5

by an array

In [15]: df.groupby(df.index.values).mean()
Out[15]: 
            0
2017-12-08  0
2017-12-18  1
2017-12-28  2
2018-01-07  3
2018-01-17  4
2018-01-27  5

@jreback jreback added the Groupby label Dec 9, 2017
@jreback
Copy link
Contributor

jreback commented Dec 9, 2017

related to #11579

@chris-b1 @TomAugspurger

@jreback
Copy link
Contributor

jreback commented Dec 9, 2017

we might just need to doc this

@jreback jreback added this to the Next Major Release milestone Dec 9, 2017
@jreback
Copy link
Contributor

jreback commented Dec 9, 2017

@achabotl a couple of examples in the doc-string might be helpful here. I'll mark it as such.

@VincentLa
Copy link
Contributor

I can potentially work on this during Pycon Sprints (2018)!

@leeviannala
Copy link

Should this be closed since it works in 0.23.4?

@mbirdi
Copy link

mbirdi commented Sep 1, 2018

I wrote an example notebook for myself documenting the index-name-groupby behavior in 0.22.0. I can post that notebook if people think its helpful.

I may need a little help posting the notebook. I don't use github much and am stuck. I tried to use nbviewer.jupyter.org to post the notebook, but I guess that website renders the notebook, and does not host it. So I want to use github to host the notebook and nbviewer to render it, but I must be missing something.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@asdinara
Copy link

asdinara commented Apr 20, 2023

After reading all comments, it seems that the issue is resolved here and the test is created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants