Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError exception with pd.resample #8683

Closed
amelio-vazquez-reina opened this issue Oct 30, 2014 · 4 comments · Fixed by #8941
Closed

ValueError exception with pd.resample #8683

amelio-vazquez-reina opened this issue Oct 30, 2014 · 4 comments · Fixed by #8941
Labels
Bug Resample resample method
Milestone

Comments

@amelio-vazquez-reina
Copy link
Contributor

When running df.resample('2200L', how='sum', label='right') with df:

2014-10-14 23:06:07.440000    6.44000
2014-10-14 23:06:07.761000    5.09600
2014-10-14 23:06:08.215000    6.44000
2014-10-14 23:06:08.486000    6.44000
2014-10-14 23:06:08.509000    5.20800
2014-10-14 23:06:08.789000    4.02842
2014-10-14 23:06:10.795000    5.65600
2014-10-14 23:06:11.618000    6.21600
2014-10-14 23:06:12.177000    6.21600
2014-10-14 23:06:14.620000    5.10720
2014-10-14 23:06:16.698000    5.95840
2014-10-14 23:06:16.745000    6.44000
2014-10-14 23:06:20.548000    6.21600
2014-10-14 23:06:20.549000    6.44000
2014-10-14 23:06:20.551000    5.95840
2014-10-14 23:06:23.206000    6.44000
2014-10-14 23:06:29.977000    6.44000
2014-10-14 23:06:35.307000    5.20800
2014-10-15 23:00:00               NaN
2014-10-15 23:00:02.200000        NaN
Name: spend, dtype: float64

I got:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-71895ab1ef27> in <module>()
----> 1 R_inst = loaded_sorted.tail(200).head(20).resample('2200L', how='sum', label='right')

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, 
loffset, limit, base)                                                                                                                                                  
   2978                               fill_method=fill_method, convention=convention,
   2979                               limit=limit, base=base)
-> 2980         return sampler.resample(self).__finalize__(self)
   2981 
   2982     def first(self, offset):

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/resample.py in resample(self, obj)
     83 
     84         if isinstance(ax, DatetimeIndex):
---> 85             rs = self._resample_timestamps()
     86         elif isinstance(ax, PeriodIndex):
     87             offset = to_offset(self.freq)

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/resample.py in _resample_timestamps(self, kind)
    273         axlabels = self.ax
    274 
--> 275         self._get_binner_for_resample(kind=kind)
    276         grouper = self.grouper
    277         binner = self.binner

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/resample.py in _get_binner_for_resample(self, kind)
    121             kind = self.kind
    122         if kind is None or kind == 'timestamp':
--> 123             self.binner, bins, binlabels = self._get_time_bins(ax)
    124         elif kind == 'timedelta':
    125             self.binner, bins, binlabels = self._get_time_delta_bins(ax)

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/tseries/resample.py in _get_time_bins(self, ax)
    182 
    183         # general version, knowing nothing about relative frequencies
--> 184         bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
    185 
    186         if self.closed == 'right':

/Users/josh/anaconda3/envs/py34/lib/python3.4/site-packages/pandas/lib.so in pandas.lib.generate_bins_dt64 (pandas/lib.c:17928)()

ValueError: Values falls after last bin

with Python 3.4.1 :: Anaconda 2.1.0 (x86_64) and:

Cython==0.21
DataShape==0.3.0
Flask==0.10.1
Jinja2==2.7.3
MarkupSafe==0.23
Pillow==2.5.1
PyYAML==3.11
Pygments==1.6
SQLAlchemy==0.9.7
Sphinx==1.2.3
Theano==0.6.0
Werkzeug==0.9.6
XlsxWriter==0.5.7
abstract-rendering==0.5.1
appnope==0.1.0
argcomplete==0.8.1
arrow==0.4.4
astropy==0.4.2
beautifulsoup4==4.3.2
binstar==0.7.1
bitarray==0.8.1
blaze==0.6.3
blz==0.6.2
bokeh==0.6.1
boto==2.34.0
cffi==0.8.6
colorama==0.3.1
configobj==5.0.6
cryptography==0.5.4
cytoolz==0.7.0
decorator==3.4.0
docutils==0.12
future==0.13.1
greenlet==0.4.4
h5py==2.3.1
ipython==2.2.0
itsdangerous==0.24
jdcal==1.0
jedi==0.8.1-final0
## FIXME: could not find svn URL in dependency_links for this package:
joblib==0.8.3-r1
llvmpy==0.12.7
lpsolve55==5.5.2.0
lxml==3.4.0
matplotlib==1.4.0
mock==1.0.1
multipledispatch==0.4.7
networkx==1.9.1
nltk==3.0.0
nose==1.3.4
numba==0.14.0
numexpr==2.3.1
numpy==1.9.0
openpyxl==1.8.5
pandas==0.15.0
parse==1.6.4
patsy==0.3.0
ply==3.4
psutil==2.1.1
psycopg2==2.5.4
py==1.4.25
pyOpenSSL==0.14
pycosat==0.6.1
pycparser==2.10
pycrypto==2.6.1
pycurl==7.19.5
pyflakes==0.8.1
pymc==3.0
pyparsing==2.0.1
pytest==2.6.3
python-dateutil==2.1
pytz==2014.7
pyzmq==14.3.1
qds-sdk==1.2.2
redis==2.9.1
requests==2.4.3
rope-py3k==0.9.4-1
runipy==0.1.1
scikit-image==0.10.1
scikit-learn==0.15.2
scipy==0.14.0
seaborn==0.5.dev
six==1.8.0
sockjs-tornado==1.0.1
spyder==2.3.1
statsmodels==0.5.0
sympy==0.7.5
tables==3.1.1
toolz==0.7.0
tornado==4.0.2
ujson==1.33
xlrd==0.9.3
@amelio-vazquez-reina amelio-vazquez-reina changed the title ValueError exception in Resampling ValueError exception with pd.resample Oct 30, 2014
@jorisvandenbossche jorisvandenbossche added Bug Resample resample method labels Oct 30, 2014
@jorisvandenbossche
Copy link
Member

I can reproduce this with your data (and the error message is in any case not very helpfull), but could you try to pin it down to which 'feature' of the data is causing it?
E.g. filling the NaNs does not matter, resampling with eg 2000L or 2200L does.

@amelio-vazquez-reina
Copy link
Contributor Author

@jorisvandenbossche You are right, it also fails with fillna(0). The resample also fails with the following:

2014-10-14 23:06:23.206000    6.440
2014-10-14 23:06:29.977000    6.440
2014-10-14 23:06:35.307000    5.208
2014-10-15 23:00:00             0
2014-10-15 23:00:02.200000      0
Name: spend, dtype: float64

@jreback jreback added this to the 0.15.2 milestone Oct 31, 2014
@jreback
Copy link
Contributor

jreback commented Oct 31, 2014

has to do with a bug I think in the edge detection (_adjust_bin_edges). This are quite a lot of edge cases.

@jorisvandenbossche
Copy link
Member

BTW, a small code sample to reproduce it:

s = pd.Series(np.random.randn(5), index=pd.date_range('2014-10-14 23:06:23.206', periods=3, freq='400L')|pd.date_range('2014-10-15 23:00:00', periods=2, freq='2200L'))

s.resample('2200L', 'mean')

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Nov 29, 2014
hkleynhans added a commit to hkleynhans/pandas that referenced this issue Nov 30, 2014
Fixes an issue where resampling over multiple days causes a ValueError
when a number of days between the normalized first and normalized last
days is not a multiple of the frequency.

Added test TestResample.test_resample_anchored_multiday

Closes pandas-dev#8683
@jreback jreback modified the milestones: 0.15.2, 0.16.0 Nov 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants