Extra Bin with Pandas Resample in 0.11.0 #4076

waltaskew · 2013-06-28T17:53:55Z

I've got a pandas data frame defined like this, using pandas 0.11.0:

    last_4_weeks_range = pandas.date_range(                                
            start=datetime.datetime(2001, 5, 4), periods=28)               
    last_4_weeks = pandas.DataFrame(                                       
        [{'REST_KEY': 1, 'DLY_TRN_QT': 80, 'DLY_SLS_AMT': 90,              
            'COOP_DLY_TRN_QT': 30, 'COOP_DLY_SLS_AMT': 20}] * 28 +         
        [{'REST_KEY': 2, 'DLY_TRN_QT': 70, 'DLY_SLS_AMT': 10,              
            'COOP_DLY_TRN_QT': 50, 'COOP_DLY_SLS_AMT': 20}] * 28,          
        index=last_4_weeks_range.append(last_4_weeks_range))               
    last_4_weeks.sort(inplace=True)

and when I go to resample it:

In [265]: last_4_weeks.resample('7D', how='sum')
Out[265]: 
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
2001-05-04               280              560          700        1050        21
2001-05-11               280              560          700        1050        21
2001-05-18               280              560          700        1050        21
2001-05-25               280              560          700        1050        21
2001-06-01                 0                0            0           0         0

I end up with an extra empty bin I wouldn't expect to see -- 2001-06-01. I wouldn't expect that bin to be there, as my 28 days are evenly divisible into the 7 day resample I'm performing. I've tried messing around with the closed kwarg, but I can't escape that extra bin. This seems like a bug, and it messes up my mean calculations when I try to do

In [266]: last_4_weeks.groupby('REST_KEY').resample('7D', how='sum').mean(level=0)
Out[266]: 
          COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
REST_KEY                                                                      
1                      112              168          504         448       5.6
2                      112              280           56         392      11.2

as the numbers are being divided by 5 rather than 4. (I also wouldn't expect REST_KEY to show up in the aggregation columns as it's part of the groupby, but that's really a smaller problem.)

The text was updated successfully, but these errors were encountered:

waltaskew · 2013-07-03T14:30:51Z

This is curiously not the case if I pass how='count' -- no extra bin is returned. This makes me suspect a bug:

In [8]: last_4_weeks.resample('7D', how='count')
Out[8]: 
2001-05-04  COOP_DLY_SLS_AMT    14
            COOP_DLY_TRN_QT     14
            DLY_SLS_AMT         14
            DLY_TRN_QT          14
            REST_KEY            14
2001-05-11  COOP_DLY_SLS_AMT    14
            COOP_DLY_TRN_QT     14
            DLY_SLS_AMT         14
            DLY_TRN_QT          14
            REST_KEY            14
2001-05-18  COOP_DLY_SLS_AMT    14
            COOP_DLY_TRN_QT     14
            DLY_SLS_AMT         14
            DLY_TRN_QT          14
            REST_KEY            14
2001-05-25  COOP_DLY_SLS_AMT    14
            COOP_DLY_TRN_QT     14
            DLY_SLS_AMT         14
            DLY_TRN_QT          14
            REST_KEY            14
dtype: int64

cpcloud · 2013-07-03T15:03:32Z

a somewhat related issue in master is that there's no longer zeros there, there's garbage values.

this is a bug in how python vs. cythonized methods work, for example passing a lambda works

In [5]: last_4_weeks.resample('7D',how=lambda x:mean(x))
Out[5]:
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  \
2001-05-04                20               40           50          75
2001-05-11                20               40           50          75
2001-05-18                20               40           50          75
2001-05-25                20               40           50          75

            REST_KEY
2001-05-04       1.5
2001-05-11       1.5
2001-05-18       1.5
2001-05-25       1.5

waltaskew · 2013-07-16T23:46:53Z

This also seems to act differently with different resample frequencies. With a frequency of 'AS', how='sum' yields the correct answer while how=lambda x: numpy.sum(x) does not:

In [14]: last_4_weeks.resample('AS', how='mean')
Out[14]: 
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
2001-01-01                20               40           50          75       1.5

In [15]: last_4_weeks.resample('AS', how=lambda x: numpy.mean(x))
Out[15]: 
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
2001-01-01               NaN              NaN          NaN         NaN       NaN

In [16]: last_4_weeks.resample('AS', how='sum')
Out[16]: 
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
2001-01-01              1120             2240         2800        4200        84

In [17]: last_4_weeks.resample('AS', how=lambda x: numpy.sum(x))
Out[17]: 
            COOP_DLY_SLS_AMT  COOP_DLY_TRN_QT  DLY_SLS_AMT  DLY_TRN_QT  REST_KEY
2001-01-01                 0                0            0           0         0

cpcloud · 2013-08-01T14:01:25Z

your last example is an issue with NaN handling

krapfn · 2013-08-26T13:42:50Z

I have also been having issues with resample adding extra bins (also in 0.11.0), and just thought I'd add that I can also see it even when the number of bins is not evenly divisible:

>>> x = pandas.DataFrame(numpy.random.randn(9, 3), index=pandas.date_range('2000-1-1', periods=9))
>>> x
                   0         1         2
2000-01-01 -1.191405  0.645320  1.308088
2000-01-02  1.229103 -0.727613  0.488344
2000-01-03  0.885808  1.381995 -0.955914
2000-01-04 -1.013526 -0.225070 -0.163507
2000-01-05  0.670316 -0.828281 -0.233381
2000-01-06  1.357537  1.446020 -0.661463
2000-01-07  0.335799  0.952127  0.591679
2000-01-08 -0.083534  1.025077 -0.146682
2000-01-09 -1.338294  1.919551  0.446385
>>> x.resample('5D')
                   0         1              2
2000-01-01  0.116059  0.049270   8.872589e-02
2000-01-06  0.067877  1.335694   5.747979e-02
2000-01-11  0.591679  0.146682  3.952525e-322

I don't have any particular insight to add, but maybe this extra info will help...

jreback · 2013-09-28T19:33:18Z

@cpcloud 0.13 or push?

cpcloud · 2013-09-28T19:35:11Z

like to do 0.13 but got a lot on my plate already ... let me see if there's anything else i can push to 0.14 in favor of this

jreback · 2013-09-28T19:38:02Z

up2u

jreback · 2013-10-04T20:21:16Z

pushing for now...can always pull back!

cpcloud · 2013-10-04T20:34:03Z

Ok
On Oct 4, 2013 4:21 PM, "jreback" notifications@github.com wrote:

pushing for now...can always pull back!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/4076#issuecomment-25727391
.

ghost assigned cpcloud Jul 3, 2013

cpcloud mentioned this issue Nov 4, 2013

Resampling a Series with a timezone using kind='period' Crashes with ~6000 Values #5430

Closed

jreback mentioned this issue Jan 14, 2014

Inconsistent behaviour in resample between daily and weekly #5937

Closed

jreback mentioned this issue Mar 22, 2014

BUG: Bug in resample with extra bins when using an evenly divisible freq (GH4076) #6690

Merged

jreback closed this as completed in #6690 Mar 23, 2014

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra Bin with Pandas Resample in 0.11.0 #4076

Extra Bin with Pandas Resample in 0.11.0 #4076

waltaskew commented Jun 28, 2013

waltaskew commented Jul 3, 2013

cpcloud commented Jul 3, 2013

waltaskew commented Jul 16, 2013

cpcloud commented Aug 1, 2013

krapfn commented Aug 26, 2013

jreback commented Sep 28, 2013

cpcloud commented Sep 28, 2013

jreback commented Sep 28, 2013

jreback commented Oct 4, 2013

cpcloud commented Oct 4, 2013

Extra Bin with Pandas Resample in 0.11.0 #4076

Extra Bin with Pandas Resample in 0.11.0 #4076

Comments

waltaskew commented Jun 28, 2013

waltaskew commented Jul 3, 2013

cpcloud commented Jul 3, 2013

waltaskew commented Jul 16, 2013

cpcloud commented Aug 1, 2013

krapfn commented Aug 26, 2013

jreback commented Sep 28, 2013

cpcloud commented Sep 28, 2013

jreback commented Sep 28, 2013

jreback commented Oct 4, 2013

cpcloud commented Oct 4, 2013