BUG: DatetimeIndex.shift(freq=...) raises near DST boundary #8616

ischwabacher opened this issue Oct 23, 2014 · 3 comments


3 participants
commented Oct 23, 2014

xref #5694 #8531 (?)
xref #8817

This is presumably caused by the fact that pytz time zones internalize the offset of the current time.

In [1]: import pandas as pd

In [2]: idx = pd.date_range('2013-11-03', tz='America/Chicago',
   ...:                     periods=6, freq='H')

In [3]: pd.Series(index=idx)
2013-11-03 00:00:00-05:00   NaN
2013-11-03 01:00:00-05:00   NaN
2013-11-03 01:00:00-06:00   NaN
2013-11-03 02:00:00-06:00   NaN
2013-11-03 03:00:00-06:00   NaN
2013-11-03 04:00:00-06:00   NaN
Freq: H, dtype: float64

In [4]: pd.Series(index=idx).shift(freq='H')
AssertionError                            Traceback (most recent call last)
<ipython-input-4-19ee418c9aa1> in <module>()
----> 1 pd.Series(index=idx).shift(freq='H')

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in shift(self, periods, freq, axis, **kwds)
   3290             new_data = self._data.shift(periods=periods, axis=block_axis)
   3291         else:
-> 3292             return self.tshift(periods, freq, **kwds)
   3294         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in tshift(self, periods, freq, axis, **kwds)
   3386         else:
   3387             new_data = self._data.copy()
-> 3388             new_data.axes[block_axis] = index.shift(periods, offset)
   3390         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in shift(self, n, freq)
    855         end = self[-1] + n * self.offset
    856         return DatetimeIndex(start=start, end=end, freq=self.offset,
--> 857                    ,
    859     def repeat(self, repeats, axis=None):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, **kwds)
    204             return cls._generate(start, end, periods, name, freq,
    205                                  tz=tz, normalize=normalize, closed=closed,
--> 206                                  infer_dst=infer_dst)
    208         if not isinstance(data, (np.ndarray, ABCSeries)):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in _generate(cls, start, end, periods, name, offset, tz, normalize, infer_dst, closed)
    369         if tz is not None and inferred_tz is not None:
    370             if not inferred_tz == tz:
--> 371                 raise AssertionError("Inferred time zone not equal to passed "
    372                                      "time zone")

AssertionError: Inferred time zone not equal to passed time zone

Playing with dateutil and pytz timezones makes me despair of repairing that assertion to be anything sane. Is it actually needed, or can we just turn it into a warning?


  • two pytz.DstTzInfos of the same zoneinfo zone that have been .normalize()ed to different times may compare different
  • a pytz.*TzInfo has a zone member containing the name of the zoneinfo zone from which it was constructed
  • a dateutil.tzfile has no public members beyond the tzinfo API, which is insufficient to tell whether two time zones are equal
  • a pytz time zone and a dateutil time zone appear to compare different under all circumstances, regardless of whether they represent the same time zone
  • the various dateutil time zone classes have no common base class within the dateutil package
  • pytz zones constructed from different names for the same zoneinfo zone (e.g. UTC and Etc/UTC) compare different


from #8817

import pandas as pd
import pytz
import datetime

dt = datetime.datetime(2014, 11, 14, 0)
dt_est = pytz.timezone('EST').localize(dt)
s = pd.Series(data=[1], index=[dt_est])

s.shift(0, freq='h')  # 2014-11-14 00:00:00-05:00 (seems okay) 
s.shift(-1, freq='h')  # 2014-11-13 18:00:00-05:00 (expected 2014-11-13 23:00:00)
s.shift(1, freq='h')  # 2014-11-13 20:00:00-05:00 (expected 2014-11-14 01:00:00)

s.shift(-1, freq='s')  # 2014-11-13 18:59:59-05:00 (same with other freq)

commented Oct 24, 2014

sounds like you need an ambiguous kw here to figure out what the user wants? or is this just not well defined ? or is it just implemted incorrectly

eg seems to me you convert to utc, shift, convert back to the tz


commented Oct 24, 2014

Additionally the same time zone across different versions of pytz may not be equal. This causes problems when manipulating for data stored in pickles/HDF5 from prior versions (see #7620). May not be a problem in this particular spot, but certainly an additional comparison issue to add to your list.


Copy link
Contributor Author

commented Oct 24, 2014

sounds like you need an ambiguous kw here to figure out what the user wants? or is this just not well defined ? or is it just implemted incorrectly

It's just a vectorized addition of an offset to a bunch of Timestamps, which is well-defined regardless of whether the Timestamps are aware or not. The problem isn't that the computation is wrong; it's that the assertion at the end is too strict.

Trying to fix this pushed PEP 431 several notches up my list of things to be excited about.

