Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with downsampling intraday data where end.time() < start.time() #1772

Closed
dalejung opened this issue Aug 16, 2012 · 1 comment

Comments

@dalejung
Copy link
Contributor

commented Aug 16, 2012

Simple Example

import pandas as pd
start = datetime.datetime(1999, 3, 1, 5)
# end hour is less than start
end = datetime.datetime(2012, 7, 31, 4)
bad_ind = pd.date_range(start, end, freq="30min")
df = pd.DataFrame({'close':1}, index=bad_ind)
try:
    df.resample('AS', 'sum')
except ValueError as e:
    print e

Long example:
http://nbviewer.maxdrawdown.com/3344040/intraday%20binning%20error.ipynb

Tracking it down, it appears that the problem is that _get_range_edges carries the time over when downsampling intraday data. So when generate_range is called during the DatetimeIndex creation, the final bin doesn't pass the while cur <= end check.

Thinking about it, there are two issues.

  1. generate_range should never output an index that doesn't include end. Maybe something
    while True:                                                                          
        yield cur                                                                        

        # last                                                                           
        if cur >= end:                                                                   
            break 
  1. _generate_range_edges should generate a range that is perfectly divisible by the freq. For the downsampling, we'd have to change the time by adjusting the end time or just zeroing both out. I don't know how many rely on this behavior though.

@ghost ghost assigned wesm Sep 11, 2012

@wesm wesm closed this in 54b54f8 Sep 11, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Sep 11, 2012

I think that any non-"Tick" offsets, (e.g. AS-DEC as you're doing there) should zero-out the start and end times. This fixes your test case-- all the other tests pass and I haven't thought through the ways this could cause other bugs (hopefully None). Your #1 point is one way to look at it-- when I thought about the date range API initially, I felt that the start and end times should be strict, with no dates generated outside them (roll forward start / roll back end). Requiring that the range include both endpoints is the other way (roll back start, roll forward end). Would be a difficult to change now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.