Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling on index and column after groupby give different results #30057

Open
danguetta opened this issue Dec 4, 2019 · 1 comment
Open
Labels

Comments

@danguetta
Copy link

Code Sample, a copy-pastable example if possible

This should be copy-pastable and self-contained:

import pandas as pd

get_df = lambda : pd.DataFrame( {'DATETIME' : pd.to_datetime(['2018-01-01 11:25:00', '2018-01-01 11:50:00', '2018-01-03 10:30:00'
                                                    , '2018-01-04 10:25:00']*2),
                                 'GROUP'    : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                                 'FILTER'   : [True, True, True, True, False, False, True, True],
                                 'X'        : [1, 2, 3, 4, 5, 6, 7, 8]} )

df = get_df()
df = df.set_index('DATETIME')
df.groupby('GROUP').resample('D').X.sum()                               # <-- LINE A
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01     3
#        2018-01-02     0
#        2018-01-03     3
#        2018-01-04     4
# B      2018-01-01    11
#        2018-01-02     0
#        2018-01-03     7
#        2018-01-04     8
# Name: X, dtype: int64

df = get_df()
df.groupby('GROUP').resample('D', on = 'DATETIME').X.sum()               # <-- LINE B
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01    10
# B      2018-01-03    11
#        2018-01-04    15
# Name: X, dtype: int64

df = get_df()
df = df.set_index('DATETIME')
df[df.FILTER].groupby('GROUP').resample('D').X.sum()                     # <-- LINE C
# Returns
# -------
# GROUP  DATETIME  
# A      2018-01-01    3
#        2018-01-02    0
#        2018-01-03    3
#        2018-01-04    4
# B      2018-01-03    7
#        2018-01-04    8
# Name: X, dtype: int64

df = get_df()
df[df.FILTER].groupby('GROUP').resample('D', on = 'DATETIME').X.sum()   # <-- LINE D
# Error
# -----
#    IndexError: index 6 is out of bounds for size 6

Problem description

Lines A and B are identical except that one does a resample on an index, and one does it on an identical column.

Lines C and D are identical except that one does a resample on an index, and one does it on an identical column.

Yet, they behave differently!

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.3
numpy : 1.16.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.0
matplotlib : 3.0.2
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.4.4
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

Thanks!!

@jbrockmendel jbrockmendel added Groupby Resample resample method labels Dec 9, 2019
@mroeschke mroeschke added the Bug label May 11, 2020
@rhshadrach
Copy link
Member

rhshadrach commented May 24, 2022

A & B now give the same output on main (line A above); D still raises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants