Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

Closed
jreback opened this issue Nov 26, 2013 · 8 comments · Fixed by #5593

Comments

@michaelaye

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2013

I stumbled and fell harshly and scrambled for several days to recover from this one. :(
I was using the following user function to return a None in certain cases:

        def process_calblock(df):
            cb = CalBlock(df, self.SV_NUM_SKIP_SAMPLE)
            if cb.kind == "ST":
                return
            return cb.bb_time

and using calgrouped.apply(process_calblock), until this commit I was returned the following, which was useful to me:

calib_block_labels
1                     2009-09-28 09:07:58.558000
2                     2009-09-28 09:18:15.019000
3                     2009-09-28 09:27:30.039000
4                                           None
5                     2009-09-28 09:38:49.989000
6                     2009-09-28 09:49:06.450000
7                     2009-09-28 09:59:24.959000
dtype: object

Since this commit I received the most mysterious error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-cfb5bb4d8eec> in <module>()
     10 # cgood = calib.Calibrator(dfgood)
     11 cbad = calib.Calibrator(dfbad)
---> 12 cbad.calgrouped.apply(process_calblock)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    371             return func(g, *args, **kwargs)
    372 
--> 373         return self._python_apply_general(f)
    374 
    375     def _python_apply_general(self, f):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _python_apply_general(self, f)
    377 
    378         return self._wrap_applied_output(keys, values,
--> 379                                          not_indexed_same=mutated)
    380 
    381     def aggregate(self, func, *args, **kwargs):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _wrap_applied_output(self, keys, values, not_indexed_same)
   2132                 if v is None:
   2133                     return DataFrame()
-> 2134                 values = [ x if x is not None else v._constructor(**v._construct_axes_dict()) for x in values ]
   2135 
   2136             v = values[0]

AttributeError: 'Timestamp' object has no attribute '_constructor'

which, as I understand from the stackoverflow, has something to do with the fact that I return None?

So, do I have to change my code, because this is a change in the API or is this a regression?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2013

Can you post something small and reproducible that generates this error?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2013

@michaelaye are you using current master? this should work

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2013

oh...sorry...you are returning None or a Timestamp or a Series? (or just None or a Timestamp)...hmmm

@michaelaye

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2013

Yes, exactly. I usually return a timestamp which is the middle time point of the group, but for certain cases I should not use this time and then I returned None. This does not work in master.

I guess I can work around by filtering the dataframe already before grouping.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2013

ok...was a minor fix.....PR #5675; also the dtypes of the columns should be correct as well if the original is a datetimelike and you return a None (it will be datetime64[ns] with the Nones as NaTs)

In this case you will get a series back that is correctly dtyped

@michaelaye

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2013

I confirm that PR #5675 works here, thanks for the quick fix!

calib_block_labels
1                    2009-09-28 09:07:58.558000
2                    2009-09-28 09:18:15.019000
3                    2009-09-28 09:27:30.039000
4                                           NaT
5                    2009-09-28 09:38:49.989000
6                    2009-09-28 09:49:06.450000
7                    2009-09-28 09:59:24.959000
dtype: datetime64[ns]
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2013

gr8!

keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.