Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ENH: groupby with a list of customgroup and string should work #3794

Closed
jreback opened this issue Jun 7, 2013 · 12 comments · Fixed by #6516
Closed

BUG/ENH: groupby with a list of customgroup and string should work #3794

jreback opened this issue Jun 7, 2013 · 12 comments · Fixed by #6516

Comments

@jreback
Copy link
Contributor

jreback commented Jun 7, 2013

see #3791, #2450

The following should work (but raises now)

import datetime as DT

df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,1,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,12,2,12,0),                                      
DT.datetime(2013,12,2,14,0),
]})

df = df.set_index('Date', drop=False)
df.groupby([TimeGrouper('6M'),'Buyer']).sum()
@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

@jreback forgive me for not reading the threads, but why not just do a resample then groupby buyer?

@jreback
Copy link
Contributor Author

jreback commented Jun 7, 2013

because you don't want to actually resample (and reduce the time groups), rather you really want a grouping by time, then a sub-grouping by the Buyer in this case, then apply the function

and end up with a multi-level index

@jreback
Copy link
Contributor Author

jreback commented Jun 7, 2013

you CAN do it with a nested how function (or actually that might be a bug too)

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

ah i c. duh.

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

i think what i was thinking of is df.groupby(TimeGrouper('6M')).sum().groupby('Buyer').sum()

@jreback
Copy link
Contributor Author

jreback commented Jun 7, 2013

that doesn't work because the string columns are not propogated (I think if you did a custom how instead of sum you could make it work though)

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

i haven't really thought about the formalisms behind groupby but are reductions supposed to distribute across them in general, i.e., df.gb(grp1).reduc.gb(grp2).reduc == df.gb([grp1, grp2]).reduc (pseudocode)

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

if f_1, f_2, ..., f_n are grouping functions should u be able to do reduc(f_1 * f_2 * ... * f_n) where * is composition and reduc is a reduction, i guess is what i'm asking...

@jreback
Copy link
Contributor Author

jreback commented Jun 7, 2013

groupby could be distributive, but it usually not so; if the first operation were a transform (rather than a reduction) then it could be distributive.....

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2013

ah ok. thanks.

@hayd
Copy link
Contributor

hayd commented Jun 13, 2013

Also, groupbys from TGs seem to have some useful functionality suppressed/hidden (?) made separate issue

@jreback
Copy link
Contributor Author

jreback commented Jun 13, 2013

@hayd maybe make a sep issue and link to this one

FYI the so question this came from (using timegrouper and between time)
use new groupby filter method with the between time as your discriminator!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants