groupby.apply modifies the index of an empty series #21192

jluttine · 2018-05-24T13:31:04Z

Code Sample, a copy-pastable example if possible

Correct behaviour for non-empty series - The index is kept unchanged:

>>> pd.Series(index=pd.DatetimeIndex(["2018-01-01"]), data=[10]).groupby([1]).apply(lambda x: x).index
DatetimeIndex(['2018-01-01'], dtype='datetime64[ns]', freq=None)

Incorrect behaviour for empty series - The index is changed:

>>> pd.Series(index=pd.DatetimeIndex([]), data=[]).groupby([]).apply(lambda x: x).index
Float64Index([], dtype='float64')

Problem description

The index should remain unchanged.

Why does this matter at all?

Can't do operations on the result that work on datetime index but not on float index. For instance .loc["2018-01-01":]
I'm using unit tests check that series is what is expected and now it's not because the index is something weird.

Expected Output

Expected behaviour for empty series:

>>> pd.Series(index=pd.DatetimeIndex([]), data=[]).groupby([]).apply(lambda x: x).index
DatetimeIndex([], dtype='datetime64[ns]', freq=None)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.42
machine: x86_64
processor:
byteorder: little
LC_ALL: None>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.42
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: None
pip: None
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.5.2
xlrd: 0.9.4
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0
pytest: None
pip: None
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: 2.5.2
xlrd: 0.9.4
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

mroeschke · 2018-05-26T01:27:31Z

Thanks for the report. Sounds reasonable and I can replicate this on the latest release. Investigation and PR's welcome!

rhshadrach · 2020-09-20T18:54:06Z

For an Index, I think it's relatively straightforward to solve this. However, for MultiIndex I don't know of a direct way to create an empty MultiIndex with a specified dtypes for each of the levels. The only way I can figure out how to do this is to create a non-emptyDataFrame with the specified types, and then subset it so that it becomes empty; e.g.

df = pd.DataFrame(
  {
    'a': pd.DatetimeIndex(["2018-01-01"]),
    'b': pd.DatetimeIndex(["2018-01-01"]),
    'c': 1
  }
).set_index(['a', 'b'])
df = df[df.c == 0]

Is there a better way, even internally?

RobbieClarken · 2020-12-15T09:26:57Z

I'm seeing something similar with an empty DataFrame in the latest pandas (v1.1.5):

>>> import pandas as pd
>>> pd.__version__
'1.1.5'
>>> df = pd.DataFrame([], columns=["A", "B"])
>>> df
Empty DataFrame
Columns: [A, B]
Index: []
>>> df.groupby("A", group_keys=False).apply(lambda g: g)
Empty DataFrame
Columns: []
Index: []

I would expect the groupby.apply to preserve the columns of the empty DataFrame. I haven't checked to see whether #34998 fixes this.

mroeschke added Bug Groupby Apply Apply, Aggregate, Transform labels May 26, 2018

jluttine mentioned this issue May 29, 2018

Apply method broken for empty integer series with datetime index #21245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby.apply modifies the index of an empty series #21192

groupby.apply modifies the index of an empty series #21192

jluttine commented May 24, 2018

INSTALLED VERSIONS

INSTALLED VERSIONS

mroeschke commented May 26, 2018 •

edited

rhshadrach commented Sep 20, 2020

RobbieClarken commented Dec 15, 2020 •

edited

groupby.apply modifies the index of an empty series #21192

groupby.apply modifies the index of an empty series #21192

Comments

jluttine commented May 24, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

INSTALLED VERSIONS

mroeschke commented May 26, 2018 • edited

rhshadrach commented Sep 20, 2020

RobbieClarken commented Dec 15, 2020 • edited

Output of `pd.show_versions()`

mroeschke commented May 26, 2018 •

edited

RobbieClarken commented Dec 15, 2020 •

edited