BUG: Unable to aggregate TimeGrouper #7453

sinhrks · 2014-06-14T01:37:22Z

Derived from #7373. There seems to be 3 issues related to TimeGrouper aggregation.

1. var, std, mean

var/std/mean raises ValueError when group key contains NaT.

import pandas as pd
import numpy as np

data = np.random.randn(20, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
df['dt'] = [datetime.datetime(2013, 1, 1), datetime.datetime(2013, 1, 2),
            datetime.datetime(2013, 1, 3), datetime.datetime(2013, 1, 4),
            datetime.datetime(2013, 1, 5)] * 4
df['dt_nat'] = [datetime.datetime(2013, 1, 1), datetime.datetime(2013, 1, 2),
                pd.NaT, datetime.datetime(2013, 1, 4),
                datetime.datetime(2013, 1, 5)] * 4

df.groupby(pd.TimeGrouper(key='dt', freq='D')).mean()
# OK
df.groupby(pd.TimeGrouper(key='dt_nat', freq='D')).mean()
# ValueError: month must be in 1..12

2. size (#7600)

size raises AttributeError regardless of NaT existence.

df.groupby(pd.TimeGrouper(key='dt', freq='D')).size()
# AttributeError: 'BinGrouper' object has no attribute 'groupings'

3. first, last, nth

It looks work, but TimeGrouper outputs different result from normal groupby.

df.groupby('dt').first()
#                    A         B         C         D  key     dt_nat
# dt                                                                
#2013-01-01 -1.868691 -0.554116 -0.094949  0.009740    1 2013-01-01
#2013-01-02  0.272139 -0.106543  1.319331 -0.532377    2 2013-01-02
#2013-01-03 -1.637544  2.699557 -0.164414 -1.451295    3        NaT
#2013-01-04  1.642609 -0.313832  0.494468 -0.698104    4 2013-01-04
#2013-01-05 -1.554106  1.230299 -1.408515 -0.000722    5 2013-01-05


df.groupby(pd.TimeGrouper(key='dt', freq='D')).first()
#                    A         B         C         D  key     dt_nat
# dt                                                                
#2013-01-01 -1.868691 -0.554116 -0.094949  0.009740    1 2013-01-01
#2013-01-02  0.272139 -0.106543  1.319331 -0.532377    2 2013-01-02
#2013-01-03 -1.637544  2.699557 -0.164414 -1.451295    3        NaT
#2013-01-04  1.642609 -0.313832  0.494468 -0.698104    4 2013-01-04
#2013-01-05 -0.024332  1.668172 -0.328200  1.731480    5 2013-01-05

# Compare 5th row

I assume the difference derived from BinGrouper sorts rows differently from normal groupby. Thus, result of normal groupby and TimeGrouper can differ.

df.groupby('dt').get_group(datetime.datetime(2013, 1, 5))
#            A         B         C         D         dt     dt_nat
#4   0.632937  0.224670 -0.201186 -0.340428 2013-01-05 2013-01-05
#9  -1.238944 -0.031075 -1.173326 -0.314716 2013-01-05 2013-01-05
#14  2.108985  0.993430  1.300605  1.452049 2013-01-05 2013-01-05
#19  0.315452 -0.817634 -0.526728  0.201415 2013-01-05 2013-01-05

df.groupby(pd.TimeGrouper(key='dt', freq='D')).get_group(datetime.datetime(2013, 1, 5))
#            A         B         C         D         dt     dt_nat
#9  -1.238944 -0.031075 -1.173326 -0.314716 2013-01-05 2013-01-05
#4   0.632937  0.224670 -0.201186 -0.340428 2013-01-05 2013-01-05
#14  2.108985  0.993430  1.300605  1.452049 2013-01-05 2013-01-05
#19  0.315452 -0.817634 -0.526728  0.201415 2013-01-05 2013-01-05

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-17T13:29:53Z

first 2 look fixed, just need validation tests. Then can deal with 3rd issue separately.

jreback · 2016-04-10T14:05:37Z

we still need tests for the first 2 parts of this issue (validation tests), yes?

sinhrks · 2016-04-10T17:17:40Z

No, tested in here.

https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_resample.py#L2390

The last remaining is nth, and I'll enable it once #11039 is merged (then close #12839 and complete).

benrifkind · 2016-05-19T16:08:26Z

Not sure if this fits in here but I have another issue with nth when I groupby a TimeGrouper and another categorical variable. The categorical variable gets dropped in the aggregation step.

Here's an example

df = pd.DataFrame({'cat': ['cat0']*2 + ['cat1']*2, 
              'date':[pd.datetime(2016,1,1), pd.datetime(2016,1,2)]*2,
             'val':np.arange(1,5)})

# cat date    val
# 0   cat0    2016-01-01  1
# 1   cat0    2016-01-02  2
# 2   cat1    2016-01-01  3
# 3   cat1    2016-01-02  4

This works like I would expect

df.set_index("date").groupby([pd.TimeGrouper("2D"), "cat"]).last()

# date    cat  val
# 2016-01-01  cat0    2
# 2016-01-01  cat1    4

But this does not

df.set_index("date").groupby([pd.TimeGrouper("2D"), "cat"]).nth(-1)

# date   val 
# 2016-01-02  2
# 2016-01-02  4

jreback · 2016-05-19T18:04:25Z

you might be using an older version

In [1]: df = pd.DataFrame({'cat': ['cat0']*2 + ['cat1']*2, 
              'date':[pd.datetime(2016,1,1), pd.datetime(2016,1,2)]*2,
             'val':np.arange(1,5)})

In [2]: df.set_index("date").groupby([pd.TimeGrouper("2D"), "cat"]).last()
Out[2]: 
                 val
date       cat      
2016-01-01 cat0    2
           cat1    4

In [3]: df.set_index("date").groupby([pd.TimeGrouper("2D"), "cat"]).nth(-1)
Out[3]: 
                 val
date       cat      
2016-01-01 cat0    2
           cat1    4

In [4]: pd.__version__
Out[4]: u'0.18.1'

benrifkind · 2016-05-19T19:57:17Z

Yup. You're right. Just updated from 0.18.0 to 0.18.1 and it works. Thanks.

sinhrks mentioned this issue Jun 14, 2014

BUG: resample raises ValueError when NaT is included #7373

Merged

jreback added Bug labels Jun 14, 2014

jreback added this to the 0.14.1 milestone Jun 14, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jun 26, 2014

This was referenced Jun 28, 2014

API/BUG: Make consistent datetime string parse function #7599

Merged

BUG: GroupBy.size created by TimeGrouper raises AttributeError #7600

Merged

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback added Testing pandas testing functions or related to the test suite Difficulty Novice labels Feb 17, 2016

jreback modified the milestones: 0.18.1, Next Major Release Feb 17, 2016

This was referenced Apr 9, 2016

GroupBy.nth includes group key inconsistently #12839

Closed

BUG: GroupBy with TimeGrouper sorts unstably #12840

Closed

jreback closed this as completed in ea9a5a8 Apr 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Unable to aggregate TimeGrouper #7453

BUG: Unable to aggregate TimeGrouper #7453

sinhrks commented Jun 14, 2014

jreback commented Feb 17, 2016

jreback commented Apr 10, 2016

sinhrks commented Apr 10, 2016

benrifkind commented May 19, 2016

jreback commented May 19, 2016

benrifkind commented May 19, 2016

BUG: Unable to aggregate TimeGrouper #7453

BUG: Unable to aggregate TimeGrouper #7453

Comments

sinhrks commented Jun 14, 2014

1. var, std, mean

2. size (#7600)

3. first, last, nth

jreback commented Feb 17, 2016

jreback commented Apr 10, 2016

sinhrks commented Apr 10, 2016

benrifkind commented May 19, 2016

jreback commented May 19, 2016

benrifkind commented May 19, 2016