Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.resample week with DST #21459

Closed
maartenhollenga opened this issue Jun 13, 2018 · 2 comments

Comments

Projects
None yet
4 participants
@maartenhollenga
Copy link

commented Jun 13, 2018

Code Sample

import pandas
import pytz

dataframe_1 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-03-04 00:00', '2017-03-05 00:00', '2017-03-06 00:00', '2017-03-07 00:00', '2017-03-08 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_1 = dataframe_1.resample('1W').sum()
print(resampled_1)

dataframe_2 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-03-25 00:00', '2017-03-26 00:00', '2017-03-27 00:00', '2017-03-28 00:00', '2017-03-29 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_2 = dataframe_2.resample('1W').sum()
print(resampled_2)

dataframe_3 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-04-15 00:00', '2017-04-16 00:00', '2017-04-17 00:00', '2017-04-18 00:00', '2017-04-19 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_3 = dataframe_3.resample('1W').sum()
print(resampled_3)

Problem description

It looks like there is something wrong with pandas.DataFrame.resample (by week) in relation to DST.
When I run the above code sample I expect three times the same output, but the DST starting week
(dataframe_2 ) differ from the regular weeks.

Actual Output

                            0
2017-03-05 00:00:00+01:00  23
2017-03-12 00:00:00+01:00  42
                            0
2017-03-26 00:00:00+01:00  36
2017-04-02 00:00:00+02:00  29
                            0
2017-04-16 00:00:00+02:00  23
2017-04-23 00:00:00+02:00  42

Expected Output

                            0
2017-03-05 00:00:00+01:00  23
2017-03-12 00:00:00+01:00  42
                            0
2017-03-26 00:00:00+01:00  23
2017-04-02 00:00:00+02:00  42
                            0
2017-04-16 00:00:00+02:00  23
2017-04-23 00:00:00+02:00  42

Output of pd.show_versions()

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: None
pip: 10.0.0
setuptools: 28.8.0
Cython: None
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung

This comment has been minimized.

Copy link
Member

commented Jun 13, 2018

@mroeschke

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

Just posting for posterity; this issue and similarly #9119 can be solved if this branch is skipped:

if self.freq != 'D' and is_superperiod(self.freq, 'D'):

But I am still unsure of the logic behind why this branch exists.

@mroeschke mroeschke referenced this issue Oct 2, 2018

Merged

BUG: Correctly weekly resample over DST #22941

5 of 5 tasks complete

@jreback jreback added this to the 0.24.0 milestone Oct 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.