Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.resample week with DST #21459

Closed
maartenhollenga opened this issue Jun 13, 2018 · 2 comments · Fixed by #22941
Closed

DataFrame.resample week with DST #21459

maartenhollenga opened this issue Jun 13, 2018 · 2 comments · Fixed by #22941
Labels
Resample resample method Timezones Timezone data dtype
Milestone

Comments

@maartenhollenga
Copy link

Code Sample

import pandas
import pytz

dataframe_1 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-03-04 00:00', '2017-03-05 00:00', '2017-03-06 00:00', '2017-03-07 00:00', '2017-03-08 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_1 = dataframe_1.resample('1W').sum()
print(resampled_1)

dataframe_2 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-03-25 00:00', '2017-03-26 00:00', '2017-03-27 00:00', '2017-03-28 00:00', '2017-03-29 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_2 = dataframe_2.resample('1W').sum()
print(resampled_2)

dataframe_3 = pandas.DataFrame(index=pandas.DatetimeIndex(
    ['2017-04-15 00:00', '2017-04-16 00:00', '2017-04-17 00:00', '2017-04-18 00:00', '2017-04-19 00:00'],
    tz=pytz.timezone('Europe/Amsterdam')),
    data=[11, 12, 13, 14, 15])
resampled_3 = dataframe_3.resample('1W').sum()
print(resampled_3)

Problem description

It looks like there is something wrong with pandas.DataFrame.resample (by week) in relation to DST.
When I run the above code sample I expect three times the same output, but the DST starting week
(dataframe_2 ) differ from the regular weeks.

Actual Output

                            0
2017-03-05 00:00:00+01:00  23
2017-03-12 00:00:00+01:00  42
                            0
2017-03-26 00:00:00+01:00  36
2017-04-02 00:00:00+02:00  29
                            0
2017-04-16 00:00:00+02:00  23
2017-04-23 00:00:00+02:00  42

Expected Output

                            0
2017-03-05 00:00:00+01:00  23
2017-03-12 00:00:00+01:00  42
                            0
2017-03-26 00:00:00+01:00  23
2017-04-02 00:00:00+02:00  42
                            0
2017-04-16 00:00:00+02:00  23
2017-04-23 00:00:00+02:00  42

Output of pd.show_versions()

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: None
pip: 10.0.0
setuptools: 28.8.0
Cython: None
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Timezones Timezone data dtype Resample resample method labels Jun 13, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 13, 2018

cc @jreback

@mroeschke
Copy link
Member

Just posting for posterity; this issue and similarly #9119 can be solved if this branch is skipped:

if self.freq != 'D' and is_superperiod(self.freq, 'D'):

But I am still unsure of the logic behind why this branch exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants