Inconsistent return type when grouping dates by frequency with custom reduction function #11742

stephen-hoover · 2015-12-02T16:53:20Z

If I group a DataFrame by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in the Grouper.

Grouping without resampling dates returns a DataFrame when I apply a function which returns a labeled Series, or a Series if the function returns a scalar:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], 'value': [10, 13]})

In [3]: def sumfunc(x):
   ...:     return pd.Series([x['value'].sum()], ('sum',))
   ...: 

In [4]: df.groupby(pd.Grouper(key='date')).apply(sumfunc)
Out[4]: 
            sum
date           
10/10/2000   10
11/10/2000   13

In [5]: type(df.groupby(pd.Grouper(key='date')).apply(sumfunc))
Out[5]: pandas.core.frame.DataFrame

In [17]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())
Out[17]: 
date
2000-10-10    10
2000-11-10    13
dtype: int64

In [18]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()))
Out[18]: pandas.core.series.Series

If I apply a frequency in the Grouper, I get a Series with a multi-index when the function returns a labeled Series, or a TypeError when it returns a scalar.

In [6]: df['date'] = pd.to_datetime(df['date'])

In [7]: df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc)
Out[7]: 
date           
2000-10-31  sum    10
2000-11-30  sum    13
dtype: int64

In [8]: type(df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc))
Out[8]: pandas.core.series.Series

In [16]: df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-ad73d0ebc475> in <module>()
----> 1 df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    713         # ignore SettingWithCopy here in case the user mutates
    714         with option_context('mode.chained_assignment',None):
--> 715             return self._python_apply_general(f)
    716 
    717     def _python_apply_general(self, f):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    720 
    721         return self._wrap_applied_output(keys, values,
--> 722                                          not_indexed_same=mutated)
    723 
    724     def aggregate(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3253             # Handle cases like BinGrouper
   3254             return self._concat_objects(keys, values,
-> 3255                                         not_indexed_same=not_indexed_same)
   3256 
   3257     def _transform_general(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
   1271                 group_names = self.grouper.names
   1272                 result = concat(values, axis=self.axis, keys=group_keys,
-> 1273                                 levels=group_levels, names=group_names)
   1274             else:
   1275 

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    810                        keys=keys, levels=levels, names=names,
    811                        verify_integrity=verify_integrity,
--> 812                        copy=copy)
    813     return op.get_result()
    814 

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    866         for obj in objs:
    867             if not isinstance(obj, NDFrame):
--> 868                 raise TypeError("cannot concatenate a non-NDFrame object")
    869 
    870             # consolidate

TypeError: cannot concatenate a non-NDFrame object

Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set freq='M' or not. I'm guessing that the difference is that the freq='M' causes an extra groupby to happen under the hood, yes? When I ran into this, what I expected to happen was for pd.Grouper(freq='M', key='date') to do a single groupby, combining rows where dates happened to fall into the same month.

Pandas version:

In [9]: pd.__version__
Out[9]: '0.17.1+22.g0c43fcc'

The text was updated successfully, but these errors were encountered:

jreback · 2015-12-02T17:28:28Z

I guess. this is a quite tricky code path. Welcome for you to take a stab at making them consistent.

Keeping in mind that .apply may not always be able to do the same thing as it has to infer return shapes and such.

You should avoid custom functions this as they are non-performant anyhow.

jreback · 2015-12-02T17:32:21Z

xref #9867

stephen-hoover · 2015-12-02T21:12:03Z

I might be able to take a look at this over Christmas, but I think I'll be too busy before then.

I wouldn't use a custom function for something like sum, but sometimes I have aggregations which aren't built-in.

jreback added Groupby Dtype Conversions Unexpected or buggy dtype conversions API Design Resample resample method labels Dec 2, 2015

jreback added this to the Next Major Release milestone Dec 2, 2015

jreback added Difficulty Advanced labels Dec 2, 2015

stephen-hoover mentioned this issue Feb 17, 2016

ENH Consistent apply output when grouping with freq #12362

Closed

jreback modified the milestones: 0.18.1, Next Major Release, 0.18.0 Feb 17, 2016

jreback closed this as completed in 2c79a50 Apr 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent return type when grouping dates by frequency with custom reduction function #11742

Inconsistent return type when grouping dates by frequency with custom reduction function #11742

stephen-hoover commented Dec 2, 2015

jreback commented Dec 2, 2015

jreback commented Dec 2, 2015

stephen-hoover commented Dec 2, 2015

Inconsistent return type when grouping dates by frequency with custom reduction function #11742

Inconsistent return type when grouping dates by frequency with custom reduction function #11742

Comments

stephen-hoover commented Dec 2, 2015

jreback commented Dec 2, 2015

jreback commented Dec 2, 2015

stephen-hoover commented Dec 2, 2015