You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I group a DataFrame by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in the Grouper.
Grouping without resampling dates returns a DataFrame when I apply a function which returns a labeled Series, or a Series if the function returns a scalar:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], 'value': [10, 13]})
In [3]: def sumfunc(x):
...: return pd.Series([x['value'].sum()], ('sum',))
...:
In [4]: df.groupby(pd.Grouper(key='date')).apply(sumfunc)
Out[4]:
sum
date
10/10/2000 10
11/10/2000 13
In [5]: type(df.groupby(pd.Grouper(key='date')).apply(sumfunc))
Out[5]: pandas.core.frame.DataFrame
In [17]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())
Out[17]:
date
2000-10-10 10
2000-11-10 13
dtype: int64
In [18]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()))
Out[18]: pandas.core.series.Series
If I apply a frequency in the Grouper, I get a Series with a multi-index when the function returns a labeled Series, or a TypeError when it returns a scalar.
In [6]: df['date'] = pd.to_datetime(df['date'])
In [7]: df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc)
Out[7]:
date
2000-10-31 sum 10
2000-11-30 sum 13
dtype: int64
In [8]: type(df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc))
Out[8]: pandas.core.series.Series
In [16]: df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-ad73d0ebc475> in <module>()
----> 1 df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
713 # ignore SettingWithCopy here in case the user mutates
714 with option_context('mode.chained_assignment',None):
--> 715 return self._python_apply_general(f)
716
717 def _python_apply_general(self, f):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
720
721 return self._wrap_applied_output(keys, values,
--> 722 not_indexed_same=mutated)
723
724 def aggregate(self, func, *args, **kwargs):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
3253 # Handle cases like BinGrouper
3254 return self._concat_objects(keys, values,
-> 3255 not_indexed_same=not_indexed_same)
3256
3257 def _transform_general(self, func, *args, **kwargs):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
1271 group_names = self.grouper.names
1272 result = concat(values, axis=self.axis, keys=group_keys,
-> 1273 levels=group_levels, names=group_names)
1274 else:
1275
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
810 keys=keys, levels=levels, names=names,
811 verify_integrity=verify_integrity,
--> 812 copy=copy)
813 return op.get_result()
814
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
866 for obj in objs:
867 if not isinstance(obj, NDFrame):
--> 868 raise TypeError("cannot concatenate a non-NDFrame object")
869
870 # consolidate
TypeError: cannot concatenate a non-NDFrame object
Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set freq='M' or not. I'm guessing that the difference is that the freq='M' causes an extra groupby to happen under the hood, yes? When I ran into this, what I expected to happen was for pd.Grouper(freq='M', key='date') to do a single groupby, combining rows where dates happened to fall into the same month.
Pandas version:
In [9]: pd.__version__
Out[9]: '0.17.1+22.g0c43fcc'
The text was updated successfully, but these errors were encountered:
If I group a
DataFrame
by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in theGrouper
.Grouping without resampling dates returns a
DataFrame
when I apply a function which returns a labeledSeries
, or aSeries
if the function returns a scalar:If I apply a frequency in the
Grouper
, I get aSeries
with a multi-index when the function returns a labeledSeries
, or aTypeError
when it returns a scalar.Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set
freq='M'
or not. I'm guessing that the difference is that thefreq='M'
causes an extragroupby
to happen under the hood, yes? When I ran into this, what I expected to happen was forpd.Grouper(freq='M', key='date')
to do a singlegroupby
, combining rows where dates happened to fall into the same month.Pandas version:
The text was updated successfully, but these errors were encountered: