Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resample gives AmbiguousTimeError when index ends on a DST boundary #10117

Closed
JackKelly opened this issue May 12, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@JackKelly
Copy link
Contributor

commented May 12, 2015

Here's the bug:

In [27]: idx = pd.date_range("2014-10-25 22:00:00", "2014-10-26 00:30:00", 
                             freq="30T", tz="Europe/London") 

In [28]: series = pd.Series(np.random.randn(len(idx)), index=idx)

In [31]: series
Out[31]: 
2014-10-25 22:00:00+01:00   -0.874014
2014-10-25 22:30:00+01:00    1.316258
2014-10-25 23:00:00+01:00   -1.334616
2014-10-25 23:30:00+01:00   -1.200390
2014-10-26 00:00:00+01:00   -0.341764
2014-10-26 00:30:00+01:00    1.509091
Freq: 30T, dtype: float64

In [29]: series.resample('30T')
---------------------------------------------------------------------------
AmbiguousTimeError                        Traceback (most recent call last)
<ipython-input-29-bb9e86068ce1> in <module>()
----> 1 series.resample('30T')

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3195                               fill_method=fill_method, convention=convention,
   3196                               limit=limit, base=base)
-> 3197         return sampler.resample(self).__finalize__(self)
   3198 
   3199     def first(self, offset):

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     83 
     84         if isinstance(ax, DatetimeIndex):
---> 85             rs = self._resample_timestamps()
     86         elif isinstance(ax, PeriodIndex):
     87             offset = to_offset(self.freq)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _resample_timestamps(self, kind)
    273         axlabels = self.ax
    274 
--> 275         self._get_binner_for_resample(kind=kind)
    276         grouper = self.grouper
    277         binner = self.binner

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_binner_for_resample(self, kind)
    121             kind = self.kind
    122         if kind is None or kind == 'timestamp':
--> 123             self.binner, bins, binlabels = self._get_time_bins(ax)
    124         elif kind == 'timedelta':
    125             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_time_bins(self, ax)
    162         first, last = ax.min(), ax.max()
    163         first, last = _get_range_edges(first, last, self.freq, closed=self.closed,
--> 164                                        base=self.base)
    165         tz = ax.tz
    166         binner = labels = DatetimeIndex(freq=self.freq,

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_range_edges(first, last, offset, closed, base)
    392         if (is_day and day_nanos % offset.nanos == 0) or not is_day:
    393             return _adjust_dates_anchored(first, last, offset,
--> 394                                           closed=closed, base=base)
    395 
    396     if not isinstance(offset, Tick):  # and first.time() != last.time():

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _adjust_dates_anchored(first, last, offset, closed, base)
    459 
    460     return (Timestamp(fresult).tz_localize(first_tzinfo),
--> 461             Timestamp(lresult).tz_localize(first_tzinfo))
    462 
    463 

pandas/tslib.pyx in pandas.tslib.Timestamp.tz_localize (pandas/tslib.c:10535)()

pandas/tslib.pyx in pandas.tslib.tz_localize_to_utc (pandas/tslib.c:50297)()

AmbiguousTimeError: Cannot infer dst time from Timestamp('2014-10-26 01:00:00'), 
try using the 'ambiguous' argument

This is my hacky work-around:

In [35]: series.tz_convert('UTC').resample('30T').tz_convert('Europe/London')
Out[35]: 
2014-10-25 22:00:00+01:00   -0.874014
2014-10-25 22:30:00+01:00    1.316258
2014-10-25 23:00:00+01:00   -1.334616
2014-10-25 23:30:00+01:00   -1.200390
2014-10-26 00:00:00+01:00   -0.341764
2014-10-26 00:30:00+01:00    1.509091
Freq: 30T, dtype: float64

The bug disappears if the end date of the index is beyond the DST boundary:

In [37]: idx = pd.date_range("2014-10-25 22:00:00", "2014-10-26 02:30:00",
                             freq="30T", tz="Europe/London") 

In [38]: series = pd.Series(np.random.randn(len(idx)), index=idx)

In [39]: series.resample('30T')
Out[39]: 
2014-10-25 22:00:00+01:00   -0.626598
2014-10-25 22:30:00+01:00    1.799176
2014-10-25 23:00:00+01:00   -0.388075
2014-10-25 23:30:00+01:00    0.641487
2014-10-26 00:00:00+01:00   -0.488203
2014-10-26 00:30:00+01:00    0.477301
2014-10-26 01:00:00+01:00    0.040997
2014-10-26 01:30:00+01:00   -1.996542
2014-10-26 01:00:00+00:00   -0.016655
2014-10-26 01:30:00+00:00   -1.445823
2014-10-26 02:00:00+00:00   -0.713523
2014-10-26 02:30:00+00:00   -0.122274
Freq: 30T, dtype: float64

All is fine if the start date is on a DST boundary but the end date is beyond the boundary:

In [47]: idx = pd.date_range("2014-10-26 00:30:00", "2014-10-26 02:30:00",
                             freq="30T", tz="Europe/London") 

In [48]: series = pd.Series(np.random.randn(len(idx)), index=idx)

In [49]: series.resample('30T')
Out[49]: 
2014-10-26 00:30:00+01:00    2.371051
2014-10-26 01:00:00+01:00   -0.033473
2014-10-26 01:30:00+01:00   -0.988517
2014-10-26 01:00:00+00:00   -0.664475
2014-10-26 01:30:00+00:00    0.865772
2014-10-26 02:00:00+00:00   -1.051219
2014-10-26 02:30:00+00:00   -1.478477
Freq: 30T, dtype: float64

(As always, I must say a huge THANK YOU to everyone working on Pandas; it really is a great bit of software)

In [50]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.16.1
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)

Possibly related:

JackKelly added a commit to nilmtk/nilmtk that referenced this issue May 12, 2015

@JackKelly

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2015

I saw that this bug is still open so I thought I should re-test Pandas 0.16.2 to see if the bug had been fixed along the way. Unfortunately this bug still exists. I have included the new traceback below (because it has different line numbers to the traceback above):

In [79]: import pandas as pd

In [80]: idx = pd.date_range("2014-10-25 22:00:00", "2014-10-26 00:30:00",freq="30T", 
          tz="Europe/London") 

In [81]: series = pd.Series(np.random.randn(len(idx)), index=idx)

In [82]: series
Out[82]: 
2014-10-25 22:00:00+01:00   -1.315553
2014-10-25 22:30:00+01:00    0.294073
2014-10-25 23:00:00+01:00    0.067067
2014-10-25 23:30:00+01:00    0.710251
2014-10-26 00:00:00+01:00   -0.192490
2014-10-26 00:30:00+01:00    0.661763
Freq: 30T, dtype: float64

In [83]: series.resample('30T')
---------------------------------------------------------------------------
AmbiguousTimeError                        Traceback (most recent call last)
<ipython-input-83-bb9e86068ce1> in <module>()
----> 1 series.resample('30T')

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3264                               fill_method=fill_method, convention=convention,
   3265                               limit=limit, base=base)
-> 3266         return sampler.resample(self).__finalize__(self)
   3267 
   3268     def first(self, offset):

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     80 
     81         if isinstance(ax, DatetimeIndex):
---> 82             rs = self._resample_timestamps()
     83         elif isinstance(ax, PeriodIndex):
     84             offset = to_offset(self.freq)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _resample_timestamps(self, kind)
    270         axlabels = self.ax
    271 
--> 272         self._get_binner_for_resample(kind=kind)
    273         grouper = self.grouper
    274         binner = self.binner

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_binner_for_resample(self, kind)
    118             kind = self.kind
    119         if kind is None or kind == 'timestamp':
--> 120             self.binner, bins, binlabels = self._get_time_bins(ax)
    121         elif kind == 'timedelta':
    122             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_time_bins(self, ax)
    159         first, last = ax.min(), ax.max()
    160         first, last = _get_range_edges(first, last, self.freq, closed=self.closed,
--> 161                                        base=self.base)
    162         tz = ax.tz
    163         binner = labels = DatetimeIndex(freq=self.freq,

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_range_edges(first, last, offset, closed, base)
    389         if (is_day and day_nanos % offset.nanos == 0) or not is_day:
    390             return _adjust_dates_anchored(first, last, offset,
--> 391                                           closed=closed, base=base)
    392 
    393     if not isinstance(offset, Tick):  # and first.time() != last.time():

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _adjust_dates_anchored(first, last, offset, closed, base)
    456 
    457     return (Timestamp(fresult).tz_localize(first_tzinfo),
--> 458             Timestamp(lresult).tz_localize(first_tzinfo))
    459 
    460 

pandas/tslib.pyx in pandas.tslib.Timestamp.tz_localize (pandas/tslib.c:10551)()

pandas/tslib.pyx in pandas.tslib.tz_localize_to_utc (pandas/tslib.c:50619)()

AmbiguousTimeError: Cannot infer dst time from Timestamp('2014-10-26 01:00:00'), 
try using the 'ambiguous' argument
@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2015

if it were fixed it would be closed

pull requests are welcome

@jreback jreback added this to the Next Major Release milestone Aug 17, 2015

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2015

cc @rockg

@linar-jether

This comment has been minimized.

Copy link

commented Jul 30, 2017

This happens also when index starts on DST boundary,

df = pd.DataFrame(columns=['a', 'b', 'c'], index=pd.date_range('2014-03-09 03:00', '2015-03-09 03:00', freq='H', tz='America/Chicago')).assign(a=np.random.rand(), b=np.random.rand(), c=np.random.rand())
df.resample('H', label='right', closed='right').sum()

results in: NonExistentTimeError: 2014-03-09 02:00:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.