Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeGrouper only works properly when using a sorted DateTimeIndex #4161

Closed
aschilling opened this issue Jul 8, 2013 · 3 comments · Fixed by #6350
Closed

TimeGrouper only works properly when using a sorted DateTimeIndex #4161

aschilling opened this issue Jul 8, 2013 · 3 comments · Fixed by #6350
Labels
Bug Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Groupby
Milestone

Comments

@aschilling
Copy link

related SO question

http://stackoverflow.com/questions/19636370/dataframe-groupbytimegrouper-d-invalid-length-for-values-or-for-binner/19636874?noredirect=1#comment29155715_19636874

Hi everybody,

today I spend the whole afternoon figuring out that TimeGrouper only works properly when using on a sorted(!) DateTimeIndex.

If the DateTimeIndex is not sorted, TimeGrouper throws no error (!) but produces corrupt results.

I would suggest either to do a sort() call within TimeGrouper or modify it to throw an error message. The fact that it silently produces corrupt results when the DateTimeIndex is not sorted is really disturbing.

Sorry, that I cannot add an example for the bug. All simple examples I created worked fine, it seems that this only leads to corrupt results with DataFrames of size 700 and above.

Best regards

Andy

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

can put up an example? (if you can repro)

@aschilling
Copy link
Author

Sorry for the delay, I had to wait until the problem reoccured. Take the following DataFrame:

import datetime as DT
df = pd.DataFrame({
'Buyer': 'Carl Carl Carl Carl Joe Carl'.split(),
'Quantity': [18,3,5,1,9,3],
'Date' : [
DT.datetime(2013,9,1,13,0),
DT.datetime(2013,9,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,3,10,0),
DT.datetime(2013,12,2,12,0),
DT.datetime(2013,9,2,14,0),
]})
df = df.set_index(['Date'])

I you execute

df.groupby(TimeGrouper(freq='5D')).sum()

you get different results, based on wether you do

df.sort_index(inplace=True)

before the groupby or not.

Best regards

Andy

@jreback
Copy link
Contributor

jreback commented Oct 1, 2013

I think its requierd to be sorted, so should just raise ...will look in 0.14...(unless you want to do a PR soon!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants