Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby.apply modifies the index of an empty series #21192

Open
jluttine opened this issue May 24, 2018 · 3 comments
Open

groupby.apply modifies the index of an empty series #21192

jluttine opened this issue May 24, 2018 · 3 comments
Labels
Apply Apply, Aggregate, Transform Bug Groupby

Comments

@jluttine
Copy link

Code Sample, a copy-pastable example if possible

Correct behaviour for non-empty series - The index is kept unchanged:

>>> pd.Series(index=pd.DatetimeIndex(["2018-01-01"]), data=[10]).groupby([1]).apply(lambda x: x).index
DatetimeIndex(['2018-01-01'], dtype='datetime64[ns]', freq=None)

Incorrect behaviour for empty series - The index is changed:

>>> pd.Series(index=pd.DatetimeIndex([]), data=[]).groupby([]).apply(lambda x: x).index
Float64Index([], dtype='float64')

Problem description

The index should remain unchanged.

Why does this matter at all?

  • Can't do operations on the result that work on datetime index but not on float index. For instance .loc["2018-01-01":]
  • I'm using unit tests check that series is what is expected and now it's not because the index is something weird.

Expected Output

Expected behaviour for empty series:

>>> pd.Series(index=pd.DatetimeIndex([]), data=[]).groupby([]).apply(lambda x: x).index
DatetimeIndex([], dtype='datetime64[ns]', freq=None)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.42
machine: x86_64
processor:
byteorder: little
LC_ALL: None>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.42
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: None
pip: None
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.5.2
xlrd: 0.9.4
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: None
pip: None
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.5.2
xlrd: 0.9.4
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@mroeschke
Copy link
Member

mroeschke commented May 26, 2018

Thanks for the report. Sounds reasonable and I can replicate this on the latest release. Investigation and PR's welcome!

@rhshadrach
Copy link
Member

For an Index, I think it's relatively straightforward to solve this. However, for MultiIndex I don't know of a direct way to create an empty MultiIndex with a specified dtypes for each of the levels. The only way I can figure out how to do this is to create a non-emptyDataFrame with the specified types, and then subset it so that it becomes empty; e.g.

df = pd.DataFrame(
  {
    'a': pd.DatetimeIndex(["2018-01-01"]),
    'b': pd.DatetimeIndex(["2018-01-01"]),
    'c': 1
  }
).set_index(['a', 'b'])
df = df[df.c == 0]

Is there a better way, even internally?

@RobbieClarken
Copy link

RobbieClarken commented Dec 15, 2020

I'm seeing something similar with an empty DataFrame in the latest pandas (v1.1.5):

>>> import pandas as pd
>>> pd.__version__
'1.1.5'
>>> df = pd.DataFrame([], columns=["A", "B"])
>>> df
Empty DataFrame
Columns: [A, B]
Index: []
>>> df.groupby("A", group_keys=False).apply(lambda g: g)
Empty DataFrame
Columns: []
Index: []

I would expect the groupby.apply to preserve the columns of the empty DataFrame. I haven't checked to see whether #34998 fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Groupby
Projects
None yet
Development

No branches or pull requests

4 participants