Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Resampling a TZ-aware Series fails if rule='W': "Values falls before first bin" #9119

Closed
JackKelly opened this issue Dec 20, 2014 · 3 comments · Fixed by #22941
Closed
Labels
Resample resample method Timezones Timezone data dtype
Milestone

Comments

@JackKelly
Copy link
Contributor

In [1]: import pandas as pd

In [2]: rng = pd.date_range("2013-04-01", "2013-05-01", tz='Europe/London', freq='H')

In [4]: series = pd.Series(index=rng)

In [6]: series.resample('W')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-906374246edd> in <module>()
----> 1 series.resample('W')

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3003                               fill_method=fill_method, convention=convention,
   3004                               limit=limit, base=base)
-> 3005         return sampler.resample(self).__finalize__(self)
   3006 
   3007     def first(self, offset):

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     83 
     84         if isinstance(ax, DatetimeIndex):
---> 85             rs = self._resample_timestamps()
     86         elif isinstance(ax, PeriodIndex):
     87             offset = to_offset(self.freq)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _resample_timestamps(self, kind)
    273         axlabels = self.ax
    274 
--> 275         self._get_binner_for_resample(kind=kind)
    276         grouper = self.grouper
    277         binner = self.binner

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_binner_for_resample(self, kind)
    121             kind = self.kind
    122         if kind is None or kind == 'timestamp':
--> 123             self.binner, bins, binlabels = self._get_time_bins(ax)
    124         elif kind == 'timedelta':
    125             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_time_bins(self, ax)
    182 
    183         # general version, knowing nothing about relative frequencies
--> 184         bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
    185 
    186         if self.closed == 'right':

/usr/local/lib/python2.7/dist-packages/pandas/lib.so in pandas.lib.generate_bins_dt64 (pandas/lib.c:17825)()

ValueError: Values falls before first bin

Resampling to weekly works fine if we first remove the timezone:

In [7]: series.tz_localize(None).resample('W')
Out[7]: 
2013-04-07   NaN
2013-04-14   NaN
2013-04-21   NaN
2013-04-28   NaN
2013-05-05   NaN
Freq: W-SUN, dtype: float64

And resampling the TZ-aware Series to daily works fine:

In [8]: series.resample('D')
Out[8]: 
2013-04-01 00:00:00+01:00   NaN
2013-04-02 00:00:00+01:00   NaN
2013-04-03 00:00:00+01:00   NaN
2013-04-04 00:00:00+01:00   NaN
2013-04-05 00:00:00+01:00   NaN
2013-04-06 00:00:00+01:00   NaN
2013-04-07 00:00:00+01:00   NaN
2013-04-08 00:00:00+01:00   NaN
2013-04-09 00:00:00+01:00   NaN
2013-04-10 00:00:00+01:00   NaN
2013-04-11 00:00:00+01:00   NaN
2013-04-12 00:00:00+01:00   NaN
2013-04-13 00:00:00+01:00   NaN
2013-04-14 00:00:00+01:00   NaN
2013-04-15 00:00:00+01:00   NaN
2013-04-16 00:00:00+01:00   NaN
2013-04-17 00:00:00+01:00   NaN
2013-04-18 00:00:00+01:00   NaN
2013-04-19 00:00:00+01:00   NaN
2013-04-20 00:00:00+01:00   NaN
2013-04-21 00:00:00+01:00   NaN
2013-04-22 00:00:00+01:00   NaN
2013-04-23 00:00:00+01:00   NaN
2013-04-24 00:00:00+01:00   NaN
2013-04-25 00:00:00+01:00   NaN
2013-04-26 00:00:00+01:00   NaN
2013-04-27 00:00:00+01:00   NaN
2013-04-28 00:00:00+01:00   NaN
2013-04-29 00:00:00+01:00   NaN
2013-04-30 00:00:00+01:00   NaN
2013-05-01 00:00:00+01:00   NaN
Freq: D, dtype: float64

This issue is similar (but not identical) to #1459 and #8941

My install versions:

In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.3
pytz: 2014.10
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.6
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.3.8
sqlalchemy: None
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)

(as always, I must applaud the Pandas dev team. It's an awesome animal!)

@JackKelly JackKelly changed the title Resampling a TZ-aware Series fails if rule='W': "Values falls before first bin" BUG: Resampling a TZ-aware Series fails if rule='W': "Values falls before first bin" Dec 20, 2014
@rockg
Copy link
Contributor

rockg commented Dec 21, 2014

This may be fixed in #5172 or maybe not (those are specific DST issues, but I saw some things in general that might overall fix it). I will see if fixing those cases fixes this as well.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Resample resample method Timezones Timezone data dtype and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 22, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 22, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@JackKelly
Copy link
Contributor Author

I have just checked this issue on Pandas 0.16.1 and I can confirm that this issue is still present (which I think we knew, but I just wanted to confirm!) Although the line numbers in the exception have changed a bit:

In [1]: import pandas as pd

In [2]: rng = pd.date_range("2013-04-01", "2013-05-01", tz='Europe/London', freq='H')

In [3]: series = pd.Series(index=rng)

In [4]: series.resample('W')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-906374246edd> in <module>()
----> 1 series.resample('W')

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3195                               fill_method=fill_method, convention=convention,
   3196                               limit=limit, base=base)
-> 3197         return sampler.resample(self).__finalize__(self)
   3198 
   3199     def first(self, offset):

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     83 
     84         if isinstance(ax, DatetimeIndex):
---> 85             rs = self._resample_timestamps()
     86         elif isinstance(ax, PeriodIndex):
     87             offset = to_offset(self.freq)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _resample_timestamps(self, kind)
    273         axlabels = self.ax
    274 
--> 275         self._get_binner_for_resample(kind=kind)
    276         grouper = self.grouper
    277         binner = self.binner

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_binner_for_resample(self, kind)
    121             kind = self.kind
    122         if kind is None or kind == 'timestamp':
--> 123             self.binner, bins, binlabels = self._get_time_bins(ax)
    124         elif kind == 'timedelta':
    125             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in _get_time_bins(self, ax)
    182 
    183         # general version, knowing nothing about relative frequencies
--> 184         bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
    185 
    186         if self.closed == 'right':

pandas/lib.pyx in pandas.lib.generate_bins_dt64 (pandas/lib.c:18952)()

ValueError: Values falls before first bin

@rockg
Copy link
Contributor

rockg commented May 12, 2015

Yes, this is outstanding. It was a different type of issue that wasn't easily fixed at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants