Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: inconsistent results in a groupby-apply when mix of scalar/Series are returned #5592

Closed
jreback opened this issue Nov 26, 2013 · 8 comments · Fixed by #5593
Closed

Comments

@jreback
Copy link
Contributor

jreback commented Nov 26, 2013

@michaelaye
Copy link
Contributor

I stumbled and fell harshly and scrambled for several days to recover from this one. :(
I was using the following user function to return a None in certain cases:

        def process_calblock(df):
            cb = CalBlock(df, self.SV_NUM_SKIP_SAMPLE)
            if cb.kind == "ST":
                return
            return cb.bb_time

and using calgrouped.apply(process_calblock), until this commit I was returned the following, which was useful to me:

calib_block_labels
1                     2009-09-28 09:07:58.558000
2                     2009-09-28 09:18:15.019000
3                     2009-09-28 09:27:30.039000
4                                           None
5                     2009-09-28 09:38:49.989000
6                     2009-09-28 09:49:06.450000
7                     2009-09-28 09:59:24.959000
dtype: object

Since this commit I received the most mysterious error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-cfb5bb4d8eec> in <module>()
     10 # cgood = calib.Calibrator(dfgood)
     11 cbad = calib.Calibrator(dfbad)
---> 12 cbad.calgrouped.apply(process_calblock)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    371             return func(g, *args, **kwargs)
    372 
--> 373         return self._python_apply_general(f)
    374 
    375     def _python_apply_general(self, f):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _python_apply_general(self, f)
    377 
    378         return self._wrap_applied_output(keys, values,
--> 379                                          not_indexed_same=mutated)
    380 
    381     def aggregate(self, func, *args, **kwargs):

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1169_g9aae1a8-py2.7-macosx-10.6-x86_64.egg/pandas/core/groupby.pyc in _wrap_applied_output(self, keys, values, not_indexed_same)
   2132                 if v is None:
   2133                     return DataFrame()
-> 2134                 values = [ x if x is not None else v._constructor(**v._construct_axes_dict()) for x in values ]
   2135 
   2136             v = values[0]

AttributeError: 'Timestamp' object has no attribute '_constructor'

which, as I understand from the stackoverflow, has something to do with the fact that I return None?

So, do I have to change my code, because this is a change in the API or is this a regression?

@jtratner
Copy link
Contributor

Can you post something small and reproducible that generates this error?

@jreback
Copy link
Contributor Author

jreback commented Dec 10, 2013

@michaelaye are you using current master? this should work

@jreback
Copy link
Contributor Author

jreback commented Dec 10, 2013

oh...sorry...you are returning None or a Timestamp or a Series? (or just None or a Timestamp)...hmmm

@michaelaye
Copy link
Contributor

Yes, exactly. I usually return a timestamp which is the middle time point of the group, but for certain cases I should not use this time and then I returned None. This does not work in master.

I guess I can work around by filtering the dataframe already before grouping.

@jreback
Copy link
Contributor Author

jreback commented Dec 10, 2013

ok...was a minor fix.....PR #5675; also the dtypes of the columns should be correct as well if the original is a datetimelike and you return a None (it will be datetime64[ns] with the Nones as NaTs)

In this case you will get a series back that is correctly dtyped

@michaelaye
Copy link
Contributor

I confirm that PR #5675 works here, thanks for the quick fix!

calib_block_labels
1                    2009-09-28 09:07:58.558000
2                    2009-09-28 09:18:15.019000
3                    2009-09-28 09:27:30.039000
4                                           NaT
5                    2009-09-28 09:38:49.989000
6                    2009-09-28 09:49:06.450000
7                    2009-09-28 09:59:24.959000
dtype: datetime64[ns]

@jreback
Copy link
Contributor Author

jreback commented Dec 10, 2013

gr8!

keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants