Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: period on weekends rolling to odd business days #5203

Closed
jreback opened this issue Oct 12, 2013 · 18 comments · Fixed by #5220
Closed

BUG: period on weekends rolling to odd business days #5203

jreback opened this issue Oct 12, 2013 · 18 comments · Fixed by #5220
Labels
Bug Period Period data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

For biz days, seems saturday is rolling to monday, but sunday to friday
looks like a bug (though it might just be ambiguous/undefined behavior).

In [11]: pd.Period('2013-10-4', freq='B')
Out[11]: Period('2013-10-04', 'B')

In [12]: pd.Period('2013-10-5', freq='B')
Out[12]: Period('2013-10-07', 'B')

In [13]: pd.Period('2013-10-6', freq='B')
Out[13]: Period('2013-10-04', 'B')

In [14]: pd.Period('2013-10-7', freq='B')
Out[14]: Period('2013-10-07', 'B')
@jreback
Copy link
Contributor Author

jreback commented Oct 12, 2013

cc @Cancan01

@jtratner
Copy link
Contributor

I don't really understand how that can be defined. Is the idea it should be next business day? nearest?

@vipatel10
Copy link

My initial thought is that it should be the previous business day. That way, if you're calculating the difference between two dates, and you come up to a weekend, the count doesn't advance until you hit Monday.

@cancan101
Copy link
Contributor

@jreback I think you mean @cancan101

That being said, so far I have tracked down the "issue" to:

In [29]: pd.tslib.period_format(pd.tslib.period_ordinal(2013,10,6,0,0,0,0,0,5000),5000)
Out[29]: u'2013-10-04'

In [30]: pd.tslib.period_format(pd.tslib.period_ordinal(2013,10,5,0,0,0,0,0,5000),5000)
Out[30]: u'2013-10-07'

where 5000 is pd.tseries.period._gfc('B')[0]

going further:

In [32]: pd.tslib.period_ordinal(2013,10,6,0,0,0,0,0,5000)
Out[32]: 11416

In [33]: pd.tslib.period_ordinal(2013,10,5,0,0,0,0,0,5000)
Out[33]: 11417

we can see that 10/6 maps to an ordinal < the ordinal for 10/5

@cancan101
Copy link
Contributor

My guess is that there is an issue here in period.c:

        days = absdate_from_ymd(year, month, day)
        weeks = days / 7;
        return (npy_int64)(days - weeks * 2) - BDAY_OFFSET;

@jreback
Copy link
Contributor Author

jreback commented Oct 13, 2013

cc @wuan

can you take a look see if anything changed?

@cancan101
Copy link
Contributor

I think there is an issue with days / 7. My guess is that is should be (days - K)/7 for some K.

@jreback
Copy link
Contributor Author

jreback commented Oct 13, 2013

@cancan101 thanks....having @wuan check this as I believe this was touched in Nano refactor

@cancan101
Copy link
Contributor

I exposed get_absdate_from_ymd:

In [2]: pd.tslib.get_absdate_from_ymd(2013,10,6)
Out[2]: 735147

In [3]: pd.tslib.get_absdate_from_ymd(2013,10,5)
Out[3]: 735146

which then gives:

In [6]: pd.tslib.get_absdate_from_ymd(2013,10,6) -  (pd.tslib.get_absdate_from_ymd(2013,10,6)/7) * 2
Out[6]: 525105

In [7]: pd.tslib.get_absdate_from_ymd(2013,10,5) -  (pd.tslib.get_absdate_from_ymd(2013,10,5)/7) * 2
Out[7]: 525106

so yes, it looks like an offset needs to be subtracted from days. The exact offset depends on whether the desired behavior is to roll forward or backward.

@cancan101
Copy link
Contributor

That being said, I agree with @jtratner that this entire behavior is undefined. That being said, this rolls forward, not backward:

In [13]: pd.Timestamp(datetime.date(2013,10,5)) + pd.offsets.BDay(0)
Out[13]: Timestamp('2013-10-07 00:00:00', tz=None)

@wuan
Copy link
Contributor

wuan commented Oct 14, 2013

I fear that the code in period.c never worked. The week number is calculated without subtracting the start of the week offset of day 0 (cf. the calculation of the week period, where 1 is substracted from the day ordinal).

In the end the week boundary is shifted by one day (to Saturday/Sunday). This leads to values of 1-0=1 for Saturday and to 2-2=0 relative to the business day count of Friday (first value is the day count and the second is the substracted week count * 2).

After fixing this only two tests fail because of different dates. More later on ...

@jreback
Copy link
Contributor Author

jreback commented Oct 14, 2013

great lmk

@wuan
Copy link
Contributor

wuan commented Oct 14, 2013

In wuan/pandas/bdays_fix you can have a look at the fixed code. The tests which were broken after implementing the fix seemed to be incorrect as well. The should now show the expected behaviour described above by @vpatel34.

The above snippet

        days = absdate_from_ymd(year, month, day)
        weeks = days / 7;
        return (npy_int64)(days - weeks * 2) - BDAY_OFFSET;

was replaced by

        days = absdate_from_ymd(year, month, day)
        // calculate the current week assuming sunday as last day of a week
        weeks = (days - 1) / 7;
        // calculate the current weekday (in range 0 .. 6)
        delta = (days - 1) - weeks * 7;
        return (npy_int64)(weeks * 5 + 1 + (delta <= 5 ? delta : 5)) - BDAY_OFFSET;

So far I had no idea how to reduce it further, so it may have a small performance impact due to the added calculations and logic.

@cancan101
Copy link
Contributor

@wuan That seems a bit complicated. Just adding or subtracting an offset from days was not enough?

@wuan
Copy link
Contributor

wuan commented Oct 14, 2013

I think it is not working without additional logic: The day counter is increasing every day and the week counter is increasing every week. The weekend has two days which have to show the same value. At the moment I'm very sure that you have to calculate the weekday in order to have the correct result for both days. @cancan101, you are welcome to supply an example.

@jreback
Copy link
Contributor Author

jreback commented Oct 14, 2013

@wuan put this up as a PR and will take a look

@cancan101
Copy link
Contributor

What about with some sort of diff = (days - k) % 7 == M ? 1 : 0

@wuan
Copy link
Contributor

wuan commented Oct 14, 2013

I'm not so sure about fixing the old code, because it is now already difficult to understand. You really have to find out, that the difference between day 0 and the end of week (ending the week on Saturday) is implicitly used for the calculation. What about the refined and hopefully more clear version:

        // calculate the current week assuming sunday as last day of a week
        weeks = (days - 1) / 7;
        // calculate the current weekday (in range 1 .. 7)
        delta = (days - 1) % 7 + 1;
        // return the number of business days in full weeks plus the business days in the last - possible partial - week
        return (npy_int64)(weeks * 5) + (delta <= 5 ? delta : 6) - BDAY_OFFSET;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants