New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: resample with tz-aware: Values falls after last bin #15549

Closed
ahcub opened this Issue Mar 2, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@ahcub
Contributor

ahcub commented Mar 2, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], tz='UTC').tz_convert('America/Chicago')

print(index)

df = pd.DataFrame([1, 2], index=index)

print(df.resample('12h', closed='right', label='right').last().ffill())

Problem description

resampling is not handling non-UTC index properly due to daylight saving time change

and the problem occurs in file https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/resample.py
function: _get_time_bins
code: binner = labels = DatetimeIndex(freq=self.freq,...

this problem can be solved by converting ax tz to UTC before the resampling and applying the original tz after DatetimeIndex is created

so the code will look like this

    tz = ax.tz
    ax = ax.tz_convert('UTC')
    if len(ax) == 0:
        binner = labels = DatetimeIndex(
            data=[], freq=self.freq, name=ax.name)
        return binner, [], labels

    first, last = ax.min(), ax.max()
    first, last = _get_range_edges(first, last, self.freq,
                                   closed=self.closed,
                                   base=self.base)
    # GH #12037
    # use first/last directly instead of call replace() on them
    # because replace() will swallow the nanosecond part
    # thus last bin maybe slightly before the end if the end contains
    # nanosecond part and lead to `Values falls after last bin` error
    binner = labels = DatetimeIndex(freq=self.freq,
                                    start=first,
                                    end=last,
                                    name=ax.name).tz_convert(tz)

this cause the bins to be always aligned by UTC times rather than original tz, but I think that it is adequate behaviour as well.

Expected Output

I expect the resampling to be successful regardless of the time range selected

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.0.2 Cython: None numpy: 1.12.0 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: 0.7.9.None psycopg2: None jinja2: None boto: None pandas_datareader: None
@ahcub

This comment has been minimized.

Show comment
Hide comment
@ahcub

ahcub Mar 2, 2017

Contributor

I can create a pull request with the change I described if it looks ok

Contributor

ahcub commented Mar 2, 2017

I can create a pull request with the change I described if it looks ok

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 2, 2017

Contributor

might be (tangentially) related to #12351 and #12037 .
though i suspect the tz is messing with this.

sure a PR to fix would be great.

Contributor

jreback commented Mar 2, 2017

might be (tangentially) related to #12351 and #12037 .
though i suspect the tz is messing with this.

sure a PR to fix would be great.

@jreback jreback added this to the Next Major Release milestone Mar 2, 2017

@jreback jreback changed the title from ValueError: Values falls after last bin to BUG: resample with tz-aware: Values falls after last bin Mar 2, 2017

ahcub added a commit to ahcub/pandas that referenced this issue Mar 2, 2017

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 20, 2017

jreback added a commit that referenced this issue Nov 21, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment