Resample yields empty groups #10603

JonasAbernot · 2015-07-16T16:48:05Z

With some parameters, the last group yield by resample is empty. Example :

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(10000,4)))
df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')

df.loc['1s':,:].resample('3s',how=lambda x : len(x))

Depending of the 'how' function used, this can lead to surprising bugs.

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.16.0-294-g45f69cd
nose: 1.3.6
Cython: 0.20.2
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 3.0.0-dev
sphinx: 1.2.2
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.8.2
pymysql: None
psycopg2: 2.5.3 (dt dec mx pq3 ext)

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-16T16:56:20Z

In [7]: pd.set_option('max_rows',12)

In [8]: df.loc['1s':,:]
Out[8]: 
                        0         1         2         3
00:00:01         0.767847 -1.805006  0.513914 -0.533759
00:00:01.003906  1.034297 -0.873930  1.254777 -0.460738
00:00:01.007812 -1.905457 -0.497061 -0.550036 -0.400423
00:00:01.011718  0.526214 -0.569812 -0.817764  1.204511
00:00:01.015625  0.061491  0.939611  0.308094  1.300434
00:00:01.019531 -0.147869  0.971442  1.239615  0.637635
...                   ...       ...       ...       ...
00:00:39.039062  1.664856 -0.821650 -0.551620 -0.442644
00:00:39.042968  1.133944  0.797726 -0.677378 -0.488098
00:00:39.046875 -0.343148 -0.123394 -1.010421  1.476257
00:00:39.050781  0.311632 -0.418035 -1.200112 -1.735927
00:00:39.054687  0.291330 -0.559795 -0.516269  1.088944
00:00:39.058593  0.918740 -0.516714 -0.415188  0.106167

[9744 rows x 4 columns]

In [9]: df.loc['1s':,:].resample('3s',how=lambda x : len(x))
Out[9]: 
            0    1    2    3
00:00:01  768  768  768  768
00:00:04  768  768  768  768
00:00:07  768  768  768  768
00:00:10  768  768  768  768
00:00:13  768  768  768  768
00:00:16  768  768  768  768
...       ...  ...  ...  ...
00:00:25  768  768  768  768
00:00:28  768  768  768  768
00:00:31  768  768  768  768
00:00:34  768  768  768  768
00:00:37  528  528  528  528
00:00:40    0    0    0    0

[14 rows x 4 columns]

looks correct. The last group is just a point as this is evently divisible.

jreback · 2015-07-16T16:57:06Z

FYI also show the actual data (generated from the code) if reporting a bug, as its then clear by simply looking what's the problem.

jreback · 2015-07-16T16:57:47Z

any reason you are not using how='count' (its the same result just much faster)

JonasAbernot · 2015-07-17T09:11:54Z

Yep, ok, I wasn't clear enough. The last group contains no point : The index finishes at 39'05, and the group begins at 40'. For the 'count' task, it is actually not a problem. But for others which can't support 0-length objects this can be annoying.

Example:

df = pd.DataFrame(np.random.normal(size=(10000,4)))
df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')
from scipy import fft

Something that works:

In [25]: df.resample('3s',how=lambda x : max(fft(x)))
Out[25]: 
                  0          1          2          3
00:00:00  55.527131  63.876320  50.189927  60.702282
00:00:03  53.586627  63.214890  55.694863  55.196211
00:00:06  63.159294  51.598472  61.389132  60.393747
00:00:09  73.133776  63.760377  69.555783  64.445265
00:00:12  60.349962  48.913074  50.045405  57.562742
00:00:15  58.858030  49.733304  55.012356  62.641561
...             ...        ...        ...        ...
00:00:24  59.661202  61.519860  49.886808  49.105434
00:00:27  48.506358  55.936740  52.039330  57.650969
00:00:30  52.030271  58.446403  59.234081  64.254844
00:00:33  57.767135  56.672450  52.793359  69.297208
00:00:36  56.431251  64.871565  63.356116  67.926122
00:00:39   5.341347   5.263054   5.745918   4.918816

[14 rows x 4 columns]

Something that doesn't :

In [26]: df.loc['1s':,:].resample('3s',how=lambda x : max(fft(x)))
Out[26]: 
Empty DataFrame
Columns: []
Index: []

Just because of generating a empty group (wich is weird), fft fails, and this leads (silently) to this empty DataFrame.

I hope this is clearer now.

(About not using 'count', the only reason is my lack of culture)

mroeschke · 2020-05-11T18:51:59Z

This looks to work on master now. Could use a test

In [17]: import pandas as pd
    ...: import numpy as np
    ...:
    ...: df = pd.DataFrame(np.random.normal(size=(10000,4)))
    ...: df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')
    ...:
    ...: df.loc['1s':,:].resample('3s').apply(lambda x: len(x))
Out[17]:
                     0      1      2      3
0 days 00:00:01  768.0  768.0  768.0  768.0
0 days 00:00:04  768.0  768.0  768.0  768.0
0 days 00:00:07  768.0  768.0  768.0  768.0
0 days 00:00:10  768.0  768.0  768.0  768.0
0 days 00:00:13  768.0  768.0  768.0  768.0
0 days 00:00:16  768.0  768.0  768.0  768.0
0 days 00:00:19  768.0  768.0  768.0  768.0
0 days 00:00:22  768.0  768.0  768.0  768.0
0 days 00:00:25  768.0  768.0  768.0  768.0
0 days 00:00:28  768.0  768.0  768.0  768.0
0 days 00:00:31  768.0  768.0  768.0  768.0
0 days 00:00:34  768.0  768.0  768.0  768.0
0 days 00:00:37  528.0  528.0  528.0  528.0

rmsmani · 2020-07-29T23:37:44Z

@mroeschke
Tested in the latest version, getting the same above result

rmsmani · 2020-08-02T08:43:08Z

I think we can close this issue

simonjayhawkins · 2020-08-02T10:48:04Z

I think we can close this issue

the issue is tagged as needs tests. if you can raises a PR adding a test to prevent regressions we could then close this issue.

tkmz-n · 2020-08-19T07:35:08Z

take

…dev#35799)

* REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * REF: use BlockManager.apply for Rolling.count Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

* REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (pandas-dev#35630) * TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799) * revert accidental rebase * REF: use BlockManager.apply for Rolling.count Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

…ce (#35899) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * REF: handle axis=None cases inside DataFrame.all/any * annotate * dummy commit to force Travis Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

* REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * BUG: BlockSlider not clearing index._cache * update whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

…36045) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (#35630) * TST: resample does not yield empty groups (#10603) (#35799) * revert accidental rebase * BUG: NDFrame.replace wrong exception type, wrong return when size==0 * bool->bool_t * whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

* REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (pandas-dev#35630) * TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799) * revert accidental rebase * REF: use BlockManager.apply for Rolling.count Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

…ce (pandas-dev#35899) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (pandas-dev#35630) * TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799) * revert accidental rebase * REF: handle axis=None cases inside DataFrame.all/any * annotate * dummy commit to force Travis Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

* REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (pandas-dev#35630) * TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799) * revert accidental rebase * BUG: BlockSlider not clearing index._cache * update whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

…andas-dev#36045) * REF: remove unnecesary try/except * TST: add test for agg on ordered categorical cols (pandas-dev#35630) * TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799) * revert accidental rebase * BUG: NDFrame.replace wrong exception type, wrong return when size==0 * bool->bool_t * whatsnew Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com> Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>

sinhrks added Usage Question Resample resample method labels Jul 16, 2015

mroeschke added Apply Apply, Aggregate, Transform Bug and removed Usage Question labels Oct 9, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Apply Apply, Aggregate, Transform Bug Resample resample method labels May 11, 2020

simonjayhawkins added this to the Contributions Welcome milestone Jul 31, 2020

github-actions bot assigned tkmz-n Aug 19, 2020

tkmz-n added a commit to tkmz-n/pandas that referenced this issue Aug 19, 2020

TST: resample does not yield empty groups (pandas-dev#10603)

53252a0

tkmz-n mentioned this issue Aug 19, 2020

TST: resample does not yield empty groups (#10603) #35799

Merged

5 tasks

tkmz-n added a commit to tkmz-n/pandas that referenced this issue Aug 20, 2020

TST: resample does not yield empty groups (pandas-dev#10603)

d7a392d

jreback modified the milestones: Contributions Welcome, 1.2 Aug 21, 2020

jreback closed this as completed in #35799 Aug 21, 2020

jreback pushed a commit that referenced this issue Aug 21, 2020

TST: resample does not yield empty groups (#10603) (#35799)

e2a622c

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this issue Aug 22, 2020

TST: resample does not yield empty groups (pandas-dev#10603) (pandas-…

315d5ce

…dev#35799)

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this issue Aug 23, 2020

TST: resample does not yield empty groups (pandas-dev#10603) (pandas-…

47121dd

…dev#35799)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample yields empty groups #10603

Resample yields empty groups #10603

JonasAbernot commented Jul 16, 2015

jreback commented Jul 16, 2015

jreback commented Jul 16, 2015

jreback commented Jul 16, 2015

JonasAbernot commented Jul 17, 2015

mroeschke commented May 11, 2020

rmsmani commented Jul 29, 2020

rmsmani commented Aug 2, 2020

simonjayhawkins commented Aug 2, 2020

tkmz-n commented Aug 19, 2020

Resample yields empty groups #10603

Resample yields empty groups #10603

Comments

JonasAbernot commented Jul 16, 2015

jreback commented Jul 16, 2015

jreback commented Jul 16, 2015

jreback commented Jul 16, 2015

JonasAbernot commented Jul 17, 2015

mroeschke commented May 11, 2020

rmsmani commented Jul 29, 2020

rmsmani commented Aug 2, 2020

simonjayhawkins commented Aug 2, 2020

tkmz-n commented Aug 19, 2020