BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

jreback · 2013-11-26T18:41:04Z

http://stackoverflow.com/questions/20224564/how-does-pandas-grouped-apply-decide-on-output-and-why-does-this-depend-on-w/20225276#20225276

michaelaye · 2013-12-10T20:19:53Z

I stumbled and fell harshly and scrambled for several days to recover from this one. :(
I was using the following user function to return a None in certain cases:

        def process_calblock(df):
            cb = CalBlock(df, self.SV_NUM_SKIP_SAMPLE)
            if cb.kind == "ST":
                return
            return cb.bb_time

and using calgrouped.apply(process_calblock), until this commit I was returned the following, which was useful to me:

calib_block_labels
1                     2009-09-28 09:07:58.558000
2                     2009-09-28 09:18:15.019000
3                     2009-09-28 09:27:30.039000
4                                           None
5                     2009-09-28 09:38:49.989000
6                     2009-09-28 09:49:06.450000
7                     2009-09-28 09:59:24.959000
dtype: object

Since this commit I received the most mysterious error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-cfb5bb4d8eec> in <module>()
     10 # cgood = calib.Calibrator(dfgood)
     11 cbad = calib.Calibrator(dfbad)
---> 12 cbad.calgrouped.apply(process_calblock)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    371             return func(g, *args, **kwargs)
    372 
--> 373         return self._python_apply_general(f)
    374 
    375     def _python_apply_general(self, f):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _python_apply_general(self, f)
    377 
    378         return self._wrap_applied_output(keys, values,
--> 379                                          not_indexed_same=mutated)
    380 
    381     def aggregate(self, func, *args, **kwargs):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _wrap_applied_output(self, keys, values, not_indexed_same)
   2132                 if v is None:
   2133                     return DataFrame()
-> 2134                 values = [ x if x is not None else v._constructor(**v._construct_axes_dict()) for x in values ]
   2135 
   2136             v = values[0]

AttributeError: 'Timestamp' object has no attribute '_constructor'

which, as I understand from the stackoverflow, has something to do with the fact that I return None?

So, do I have to change my code, because this is a change in the API or is this a regression?

jtratner · 2013-12-10T20:27:30Z

Can you post something small and reproducible that generates this error?

jreback · 2013-12-10T20:33:03Z

@michaelaye are you using current master? this should work

jreback · 2013-12-10T20:33:54Z

oh...sorry...you are returning None or a Timestamp or a Series? (or just None or a Timestamp)...hmmm

michaelaye · 2013-12-10T21:31:29Z

Yes, exactly. I usually return a timestamp which is the middle time point of the group, but for certain cases I should not use this time and then I returned None. This does not work in master.

I guess I can work around by filtering the dataframe already before grouping.

jreback · 2013-12-10T22:20:14Z

ok...was a minor fix.....PR #5675; also the dtypes of the columns should be correct as well if the original is a datetimelike and you return a None (it will be datetime64[ns] with the Nones as NaTs)

In this case you will get a series back that is correctly dtyped

michaelaye · 2013-12-10T22:25:27Z

I confirm that PR #5675 works here, thanks for the quick fix!

calib_block_labels
1                    2009-09-28 09:07:58.558000
2                    2009-09-28 09:18:15.019000
3                    2009-09-28 09:27:30.039000
4                                           NaT
5                    2009-09-28 09:38:49.989000
6                    2009-09-28 09:49:06.450000
7                    2009-09-28 09:59:24.959000
dtype: datetime64[ns]

jreback · 2013-12-10T22:27:18Z

gr8!

keep em coming!

jreback mentioned this issue Nov 26, 2013

BUG: Bug in groupby returning non-consistent types when user function returns a None, (GH5992) #5593

Merged

jreback closed this as completed in #5593 Nov 26, 2013

jreback mentioned this issue Dec 10, 2013

BUG: properly handle a user function ingroupby that returns all scalars (GH5592) #5675

Merged

danielballan mentioned this issue Jun 5, 2014

ENH: Allow aggregate numeric operations on timedelta64. #6884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

jreback commented Nov 26, 2013

michaelaye commented Dec 10, 2013

jtratner commented Dec 10, 2013

jreback commented Dec 10, 2013

jreback commented Dec 10, 2013

michaelaye commented Dec 10, 2013

jreback commented Dec 10, 2013

michaelaye commented Dec 10, 2013

jreback commented Dec 10, 2013

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

Comments

jreback commented Nov 26, 2013

michaelaye commented Dec 10, 2013

jtratner commented Dec 10, 2013

jreback commented Dec 10, 2013

jreback commented Dec 10, 2013

michaelaye commented Dec 10, 2013

jreback commented Dec 10, 2013

michaelaye commented Dec 10, 2013

jreback commented Dec 10, 2013