Inconsistent return type when grouping dates by frequency with custom reduction function #11742

stephen-hoover opened this issue Dec 2, 2015


Dec 2, 2015

If I group a DataFrame by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in the Grouper.

Grouping without resampling dates returns a DataFrame when I apply a function which returns a labeled Series, or a Series if the function returns a scalar:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], 'value': [10, 13]})

In [3]: def sumfunc(x):
   ...:     return pd.Series([x['value'].sum()], ('sum',))

In [4]: df.groupby(pd.Grouper(key='date')).apply(sumfunc)
10/10/2000   10
11/10/2000   13

In [5]: type(df.groupby(pd.Grouper(key='date')).apply(sumfunc))
Out[5]: pandas.core.frame.DataFrame

In [17]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())
2000-10-10    10
2000-11-10    13
dtype: int64

In [18]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()))
Out[18]: pandas.core.series.Series

If I apply a frequency in the Grouper, I get a Series with a multi-index when the function returns a labeled Series, or a TypeError when it returns a scalar.

In [6]: df['date'] = pd.to_datetime(df['date'])

In [7]: df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc)
2000-10-31  sum    10
2000-11-30  sum    13
dtype: int64

In [8]: type(df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc))
Out[8]: pandas.core.series.Series

In [16]: df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
TypeError                                 Traceback (most recent call last)
<ipython-input-16-ad73d0ebc475> in <module>()
----> 1 df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/ in apply(self, func, *args, **kwargs)
    713         # ignore SettingWithCopy here in case the user mutates
    714         with option_context('mode.chained_assignment',None):
--> 715             return self._python_apply_general(f)
    717     def _python_apply_general(self, f):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/ in _python_apply_general(self, f)
    721         return self._wrap_applied_output(keys, values,
--> 722                                          not_indexed_same=mutated)
    724     def aggregate(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/ in _wrap_applied_output(self, keys, values, not_indexed_same)
   3253             # Handle cases like BinGrouper
   3254             return self._concat_objects(keys, values,
-> 3255                                         not_indexed_same=not_indexed_same)
   3257     def _transform_general(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/ in _concat_objects(self, keys, values, not_indexed_same)
   1271                 group_names = self.grouper.names
   1272                 result = concat(values, axis=self.axis, keys=group_keys,
-> 1273                                 levels=group_levels, names=group_names)
   1274             else:

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/ in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    810                        keys=keys, levels=levels, names=names,
    811                        verify_integrity=verify_integrity,
--> 812                        copy=copy)
    813     return op.get_result()

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/ in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    866         for obj in objs:
    867             if not isinstance(obj, NDFrame):
--> 868                 raise TypeError("cannot concatenate a non-NDFrame object")
    870             # consolidate

TypeError: cannot concatenate a non-NDFrame object

Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set freq='M' or not. I'm guessing that the difference is that the freq='M' causes an extra groupby to happen under the hood, yes? When I ran into this, what I expected to happen was for pd.Grouper(freq='M', key='date') to do a single groupby, combining rows where dates happened to fall into the same month.

Pandas version:

In [9]: pd.__version__
Out[9]: '0.17.1+22.g0c43fcc'

Dec 2, 2015

I guess. this is a quite tricky code path. Welcome for you to take a stab at making them consistent.

Keeping in mind that .apply may not always be able to do the same thing as it has to infer return shapes and such.

You should avoid custom functions this as they are non-performant anyhow.

@jreback jreback added this to the Next Major Release milestone Dec 2, 2015


Dec 2, 2015

xref #9867


Contributor Author

Dec 2, 2015

I might be able to take a look at this over Christmas, but I think I'll be too busy before then.

I wouldn't use a custom function for something like sum, but sometimes I have aggregations which aren't built-in.

@jreback jreback modified the milestones: 0.18.1, Next Major Release, 0.18.0 Feb 17, 2016

@jreback jreback closed this in 2c79a50 Apr 1, 2016

