Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.DataFrame.resample behaves inconsistently when the rule is set to 24H/1D and 48H/2D #24127

Closed
lucky06688 opened this issue Dec 6, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@lucky06688
Copy link

commented Dec 6, 2018

Code Sample, a copy-pastable example if possible

>>> pd.__version__
'0.23.4'
>>> index = pd.date_range('11/11/2000 06:00:00', periods=50, freq='H')
>>> series = pd.Series(range(50), index=index)
>>> series.head(10)
2000-11-11 06:00:00    0
2000-11-11 07:00:00    1
2000-11-11 08:00:00    2
2000-11-11 09:00:00    3
2000-11-11 10:00:00    4
2000-11-11 11:00:00    5
2000-11-11 12:00:00    6
2000-11-11 13:00:00    7
2000-11-11 14:00:00    8
2000-11-11 15:00:00    9
Freq: H, dtype: int64
>>> series.resample('24H').count()
2000-11-11    18
2000-11-12    24
2000-11-13     8
Freq: 24H, dtype: int64
>>> series.resample('1D').count()
2000-11-11    18
2000-11-12    24
2000-11-13     8
Freq: D, dtype: int64
>>> series.resample('48H').count()
2000-11-11    42
2000-11-13     8
Freq: 48H, dtype: int64
>>> series.resample('2D').count()
2000-11-11 06:00:00    48
2000-11-13 06:00:00     2
dtype: int64

Problem description

As you can see, when I set rule to '24H' or '1D', the behavior of resample is consistent (in both cases it starts at 0 o'clock on the first day), but for '48H' and '2D' it's obviously not.

Expected Output

If the behavior consistency of ‘24H’ and ‘1D’ is correct, then the behaviors of '48H' and '2D' should also be consistent.

Output of pd.show_versions()

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.4.3
Cython: None
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Seems the issue is in pandas.core.resample._get_range_edges, line 1591.

I think day_nanos % offset.nanos == 0 should be the other way around. This will only ever evaluate to True for Day(n=1). I’m guessing this wasn’t the inention as it would be overly complicated compared to day_nanos == offset.nanos.

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

Added the fix in this commit since I was already updating _get_range_edges. Tests pass and now gives the expected output.

eb05501

Edit: It was suggested that a new PR be opened for this (#24195)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.