Resampling on index and column after groupby give different results #30057

danguetta · 2019-12-04T17:58:35Z

Code Sample, a copy-pastable example if possible

This should be copy-pastable and self-contained:

import pandas as pd

get_df = lambda : pd.DataFrame( {'DATETIME' : pd.to_datetime(['2018-01-01 11:25:00', '2018-01-01 11:50:00', '2018-01-03 10:30:00'
                                                    , '2018-01-04 10:25:00']*2),
                                 'GROUP'    : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                                 'FILTER'   : [True, True, True, True, False, False, True, True],
                                 'X'        : [1, 2, 3, 4, 5, 6, 7, 8]} )

df = get_df()
df = df.set_index('DATETIME')
df.groupby('GROUP').resample('D').X.sum()                               # <-- LINE A
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01     3
#        2018-01-02     0
#        2018-01-03     3
#        2018-01-04     4
# B      2018-01-01    11
#        2018-01-02     0
#        2018-01-03     7
#        2018-01-04     8
# Name: X, dtype: int64

df = get_df()
df.groupby('GROUP').resample('D', on = 'DATETIME').X.sum()               # <-- LINE B
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01    10
# B      2018-01-03    11
#        2018-01-04    15
# Name: X, dtype: int64

df = get_df()
df = df.set_index('DATETIME')
df[df.FILTER].groupby('GROUP').resample('D').X.sum()                     # <-- LINE C
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01    3
#        2018-01-02    0
#        2018-01-03    3
#        2018-01-04    4
# B      2018-01-03    7
#        2018-01-04    8
# Name: X, dtype: int64

df = get_df()
df[df.FILTER].groupby('GROUP').resample('D', on = 'DATETIME').X.sum()   # <-- LINE D
# Error
# -----
#    IndexError: index 6 is out of bounds for size 6

Problem description

Lines A and B are identical except that one does a resample on an index, and one does it on an identical column.

Lines C and D are identical except that one does a resample on an index, and one does it on an identical column.

Yet, they behave differently!

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.3
numpy : 1.16.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.0
matplotlib : 3.0.2
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.4.4
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

Thanks!!

The text was updated successfully, but these errors were encountered:

rhshadrach · 2022-05-24T21:20:44Z

A & B now give the same output on main (line A above); D still raises.

jbrockmendel added Groupby Resample resample method labels Dec 9, 2019

mroeschke added the Bug label May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling on index and column after groupby give different results #30057

Resampling on index and column after groupby give different results #30057

danguetta commented Dec 4, 2019

INSTALLED VERSIONS

rhshadrach commented May 24, 2022 •

edited

Resampling on index and column after groupby give different results #30057

Resampling on index and column after groupby give different results #30057

Comments

danguetta commented Dec 4, 2019

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

rhshadrach commented May 24, 2022 • edited

Output of `pd.show_versions()`

rhshadrach commented May 24, 2022 •

edited