ValueError: Values falls after last bin when Resampling using pd.tseries.offsets.Nano as period #12037

marcelnem · 2016-01-14T10:40:40Z

I have a timeseries in dataframe named dfi with non-eqispaced times as index

print dfi.value
print
print "len "+ str(len(dfi))

Output:

datetime
2015-10-01 13:58:10.427   -10.072100
2015-10-01 13:58:11.419   -10.072100
2015-10-01 13:58:12.417   -10.072100
2015-10-01 13:58:13.420   -10.072100
2015-10-01 13:58:14.426   -10.072100
2015-10-01 13:58:15.427   -10.072100
2015-10-01 13:58:16.418   -10.072100
2015-10-01 13:58:17.418    -9.753230
2015-10-01 13:58:18.416    -9.753230
2015-10-01 13:58:19.428    -9.753230
2015-10-01 13:58:20.427    -9.753230
2015-10-01 13:58:21.419    -9.753230
2015-10-01 13:58:22.416    -9.753230
2015-10-01 13:58:23.429    -9.753230
2015-10-01 13:58:24.416    -9.753230
2015-10-01 13:58:25.428    -9.753230
2015-10-01 13:58:26.418    -9.753230
2015-10-01 13:58:27.416    -9.396140
2015-10-01 13:58:28.416    -9.396140
2015-10-01 13:58:29.429    -9.396140
2015-10-01 13:58:32.416    -9.396140
2015-10-01 13:58:33.427    -9.396140
2015-10-01 13:58:34.428    -9.396140
2015-10-01 13:58:35.462    -9.396140
2015-10-01 13:58:36.416    -9.396140
2015-10-01 13:58:37.427    -9.010000
2015-10-01 13:58:38.428    -9.010000
2015-10-01 13:58:39.435    -9.010000
2015-10-01 13:58:40.437    -9.010000
2015-10-01 13:58:41.416    -9.010000
                             ...    
2015-10-03 23:59:28.052    -0.759718
2015-10-03 23:59:29.040    -0.759718
2015-10-03 23:59:30.048    -0.759718
2015-10-03 23:59:31.048    -0.759718
2015-10-03 23:59:32.060    -0.759718
2015-10-03 23:59:33.049    -0.759718
2015-10-03 23:59:34.051    -0.759718
2015-10-03 23:59:35.041    -0.759718
2015-10-03 23:59:36.061    -0.759718
2015-10-03 23:59:37.059    -1.010490
2015-10-03 23:59:38.040    -1.010490
2015-10-03 23:59:39.051    -1.010490
2015-10-03 23:59:40.040    -1.010490
2015-10-03 23:59:41.072    -1.010490
2015-10-03 23:59:42.049    -1.010490
2015-10-03 23:59:43.038    -1.010490
2015-10-03 23:59:44.040    -1.010490
2015-10-03 23:59:45.040    -1.010490
2015-10-03 23:59:48.049    -1.133730
2015-10-03 23:59:49.049    -1.133730
2015-10-03 23:59:50.048    -1.133730
2015-10-03 23:59:52.050    -1.133730
2015-10-03 23:59:53.050    -1.133730
2015-10-03 23:59:54.059    -1.133730
2015-10-03 23:59:55.049    -1.133730
2015-10-03 23:59:56.041    -1.133730
2015-10-03 23:59:59.039    -1.296430
2015-10-04 00:00:00.050    -1.296430
2015-10-04 00:00:01.060    -1.296430
2015-10-04 00:00:02.040    -1.296430
Name: value, dtype: float64

I get an error when running this code:

print period_seconds
period_nanos=int(period_seconds*(10**9))
print period_nanos
res= dfi.value.resample(pd.tseries.offsets.Nano(period_nanos), how=[np.min, np.max,'mean'])

Output + error:

4.035752
4035751999

ValueError                                Traceback (most recent call last)
<ipython-input-14-92e377227823> in <module>()
      5     period_nanos=int(period_seconds*(10**9))
      6     print period_nanos
----> 7     res= dfi.value.resample(pd.tseries.offsets.Nano(period_nanos), how=[np.min, np.max,'mean'])
      8 
      9     nullrows=pd.isnull(res).any(1).nonzero()[0]

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\core\generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3641                               fill_method=fill_method, convention=convention,
   3642                               limit=limit, base=base)
-> 3643         return sampler.resample(self).__finalize__(self)
   3644 
   3645     def first(self, offset):

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in resample(self, obj)
     80 
     81         if isinstance(ax, DatetimeIndex):
---> 82             rs = self._resample_timestamps()
     83         elif isinstance(ax, PeriodIndex):
     84             offset = to_offset(self.freq)

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _resample_timestamps(self, kind)
    274         axlabels = self.ax
    275 
--> 276         self._get_binner_for_resample(kind=kind)
    277         grouper = self.grouper
    278         binner = self.binner

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _get_binner_for_resample(self, kind)
    118             kind = self.kind
    119         if kind is None or kind == 'timestamp':
--> 120             self.binner, bins, binlabels = self._get_time_bins(ax)
    121         elif kind == 'timedelta':
    122             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _get_time_bins(self, ax)
    179 
    180         # general version, knowing nothing about relative frequencies
--> 181         bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
    182 
    183         if self.closed == 'right':

pandas\lib.pyx in pandas.lib.generate_bins_dt64 (pandas\lib.c:20875)()

ValueError: Values falls after last bin

packages versions:

import pip
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
     for i in installed_packages])
for i in installed_packages_list:
    print i

Output:

alabaster==0.7.6
anaconda-client==1.2.1
argcomplete==1.0.0
astropy==1.1.1
babel==2.1.1
backports-abc==0.4
backports.ssl-match-hostname==3.4.0.2
beautifulsoup4==4.4.1
bitarray==0.8.1
blaze==0.9.0
bokeh==0.11.0
boto==2.38.0
bottleneck==1.0.0
cdecimal==2.3
cffi==1.2.1
clyent==1.2.0
colorama==0.3.3
comtypes==1.1.2
conda-build==1.18.2
conda-env==2.4.5
conda==3.19.0
configobj==5.0.6
cryptography==0.9.1
cycler==0.9.0
cython==0.23.4
cytoolz==0.7.4
datashape==0.5.0
decorator==4.0.6
docutils==0.12
enum34==1.1.2
et-xmlfile==1.0.1
fastcache==1.0.2
flask==0.10.1
funcsigs==0.4
futures==3.0.3
gevent-websocket==0.9.3
gevent==1.0.1
greenlet==0.4.9
grin==1.2.1
h5py==2.5.0
idna==2.0
ipaddress==1.0.14
ipykernel==4.1.1
ipython-genutils==0.1.0
ipython==4.0.1
ipywidgets==4.1.0
itsdangerous==0.24
jdcal==1.2
jedi==0.9.0
jinja2==2.8
jsonschema==2.4.0
jupyter-client==4.1.1
jupyter-console==4.0.3
jupyter-core==4.0.6
jupyter==1.0.0
llvmlite==0.8.0
lxml==3.5.0
markupsafe==0.23
matplotlib==1.5.1
menuinst==1.3.2
mistune==0.7.1
multipledispatch==0.4.8
nbconvert==4.1.0
nbformat==4.0.1
networkx==1.10
nltk==3.1
nose==1.3.7
notebook==4.1.0
numba==0.22.1
numexpr==2.4.6
numpy==1.10.1
odo==0.4.0
openpyxl==2.3.2
pandas==0.17.1
path.py==0.0.0
patsy==0.4.0
pep8==1.6.2
pickleshare==0.5
pillow==3.0.0
pip==7.1.2
ply==3.8
psutil==3.2.2
py==1.4.30
pyasn1==0.1.9
pycosat==0.6.1
pycparser==2.14
pycrypto==2.6.1
pycurl==7.19.5.3
pyflakes==1.0.0
pygments==2.0.2
pyopenssl==0.15.1
pyparsing==2.0.3
pyreadline==2.1
pytest==2.8.1
python-dateutil==2.4.2
pytz==2015.7
pywin32==219
pyyaml==3.11
pyzmq==15.2.0
qtconsole==4.1.1
requests==2.9.0
rope==0.9.4
scikit-image==0.11.3
scikit-learn==0.17
scipy==0.16.0
setuptools==19.1.1
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
snowballstemmer==1.2.0
sockjs-tornado==1.0.1
sphinx-rtd-theme==0.1.7
sphinx==1.3.1
spyder==2.3.8
sqlalchemy==1.0.11
statsmodels==0.6.1
sympy==0.7.6.1
tables==3.2.2
toolz==0.7.4
tornado==4.3
traitlets==4.0.0
ujson==1.33
unicodecsv==0.14.1
werkzeug==0.11.3
wheel==0.26.0
xlrd==0.9.4
xlsxwriter==0.7.7
xlwings==0.6.1
xlwt==1.0.0

The text was updated successfully, but these errors were encountered:

jreback · 2016-01-14T12:24:54Z

xref #9119

does look buggy. can you post an easily reproducible/simpler example that can be easily copy-pasted

marcelnem · 2016-01-14T14:02:36Z

Here is reproducible/simpler example:

running in ipython notebook and python 2

import pandas as pd
import numpy as np

start=1443707890427
end=1443916802040
dif=end-start
length=1000
np.random.seed(seed=16516)
timestamps=np.random.random_integers(0,dif,length);
timestamps =timestamps+start
timestamps = np.sort(timestamps)

datetimes=pd.to_datetime(timestamps,unit="ms")
values = np.random.rand(length)

dt_test=pd.DataFrame(values,columns=["value"],index=datetimes)
print "dt_test.head()"
print dt_test.head()
print

period_seconds=4.035752
print "period_seconds"
print period_seconds
print
period_nanos=int(period_seconds*(10**9))
print "period_nanos"
print period_nanos

res= dt_test.value.resample(pd.tseries.offsets.Nano(period_nanos), how=[np.min, np.max,'mean'])

Output:

dt_test.head()
                            value
2015-10-01 13:59:11.020  0.795006
2015-10-01 13:59:30.583  0.725395
2015-10-01 14:01:31.597  0.731184
2015-10-01 14:06:40.423  0.982237
2015-10-01 14:08:28.432  0.014274

period_seconds
4.035752

period_nanos
4035751999
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-f6df1fcb5427> in <module>()
     27 print period_nanos
     28 
---> 29 res= dt_test.value.resample(pd.tseries.offsets.Nano(period_nanos), how=[np.min, np.max,'mean'])

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\core\generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   3641                               fill_method=fill_method, convention=convention,
   3642                               limit=limit, base=base)
-> 3643         return sampler.resample(self).__finalize__(self)
   3644 
   3645     def first(self, offset):

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in resample(self, obj)
     80 
     81         if isinstance(ax, DatetimeIndex):
---> 82             rs = self._resample_timestamps()
     83         elif isinstance(ax, PeriodIndex):
     84             offset = to_offset(self.freq)

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _resample_timestamps(self, kind)
    274         axlabels = self.ax
    275 
--> 276         self._get_binner_for_resample(kind=kind)
    277         grouper = self.grouper
    278         binner = self.binner

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _get_binner_for_resample(self, kind)
    118             kind = self.kind
    119         if kind is None or kind == 'timestamp':
--> 120             self.binner, bins, binlabels = self._get_time_bins(ax)
    121         elif kind == 'timedelta':
    122             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

C:\Users\USER1\Anaconda2\lib\site-packages\pandas\tseries\resample.pyc in _get_time_bins(self, ax)
    179 
    180         # general version, knowing nothing about relative frequencies
--> 181         bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
    182 
    183         if self.closed == 'right':

pandas\lib.pyx in pandas.lib.generate_bins_dt64 (pandas\lib.c:20875)()

ValueError: Values falls after last bin

BranYang · 2016-01-20T14:33:07Z

The issue is caused by line 164, 165 in pandas/tseries/resample.py

binner = labels = DatetimeIndex(freq=self.freq,
                                start=first.replace(tzinfo=None),
                                # replace will truncate to millisecond 
                                end=last.replace(tzinfo=None),
                                tz=tz,
                                name=ax.name)

Consider this example

In [1]: import pandas as pd

In [2]: from pandas.tseries.index import DatetimeIndex

In [3]: s_ns = 1443707950041939524

In [4]: itvl = 10**9

In [5]: e_ns = s_ns + itvl

In [6]: s = pd.Timestamp(s_ns).tz_localize(None)

In [7]: e = pd.Timestamp(e_ns).tz_localize(None)

In [8]: e
Out[8]: Timestamp('2015-10-01 13:59:11.041939524')

In [9]: indx = DatetimeIndex(freq=pd.tseries.offsets.Nano(itvl/20),start=s, end=
e,tz=None)

In [10]: indx[-1]
Out[10]: Timestamp('2015-10-01 13:59:11.041939524', offset='50000000N')

In [11]: replaced = DatetimeIndex(freq=pd.tseries.offsets.Nano(itvl/20),start=s.
replace(tzinfo=None), end=e.replace(tzinfo=None),tz=None)

In [12]: replaced[-1]
Out[12]: Timestamp('2015-10-01 13:59:11.041939', offset='50000000N')

The last item clearly out of the bound if using replace.
Should we consider not to use replace given its current behavior (i.e., throw away the nano second information)?

jreback · 2016-01-20T15:40:45Z

@BranYang hmm, that does look likely.

Timestamp.replace is pretty naive in that it doesn't understand nanoseconds at all. So it indeed dropping the nanos.

What you need to do is fix that as I believe this is a symptom of an invalid replace.

want to take a crack at it? looking tslib.pyx/Timestamp

jreback · 2016-01-20T15:43:09Z

a couple of other issues might be showing similar symtoms, e.g. #6085 (and linked from there). If this proves to fix, we will want to add tests for those as well.

…ano as period Closes pandas-dev#12037 Author: Bran Yang <snowolfy@163.com> Closes pandas-dev#12270 from BranYang/nanosec and squashes the following commits: bff0c85 [Bran Yang] Add to whatsnew and some comments fd0b307 [Bran Yang] Fix pandas-dev#12037 Error when Resampling using pd.tseries.offsets.Nano as period

jreback added Bug Resample resample method Difficulty Intermediate labels Jan 14, 2016

jreback added this to the Next Major Release milestone Jan 14, 2016

jreback added the Timeseries label Jan 20, 2016

BranYang mentioned this issue Feb 9, 2016

Fix #12037 Error when Resampling using pd.tseries.offsets.Nano as period #12270

Closed

jreback closed this as completed in ab29f93 Feb 10, 2016

jreback mentioned this issue Mar 2, 2017

BUG: resample with tz-aware: Values falls after last bin #15549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Values falls after last bin when Resampling using pd.tseries.offsets.Nano as period #12037

ValueError: Values falls after last bin when Resampling using pd.tseries.offsets.Nano as period #12037

marcelnem commented Jan 14, 2016

jreback commented Jan 14, 2016

marcelnem commented Jan 14, 2016

BranYang commented Jan 20, 2016

jreback commented Jan 20, 2016

jreback commented Jan 20, 2016

ValueError: Values falls after last bin when Resampling using pd.tseries.offsets.Nano as period #12037

ValueError: Values falls after last bin when Resampling using pd.tseries.offsets.Nano as period #12037

Comments

marcelnem commented Jan 14, 2016

jreback commented Jan 14, 2016

marcelnem commented Jan 14, 2016

BranYang commented Jan 20, 2016

jreback commented Jan 20, 2016

jreback commented Jan 20, 2016