Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of holiday and holiday calendar. #6719

Closed
wants to merge 9 commits into from

Conversation

rockg
Copy link
Contributor

@rockg rockg commented Mar 27, 2014

Implementation of holidays based on DateOffset objects and calendars that are collections of those holidays. These calendars can be passed into CustomBusinessDay. Also, add an Easter offset for use in holiday calendars.

Currently the default dates are hard-coded in the calendar class but this should move somewhere else and be calculated differently. I'm open to suggestions here. Also, the 4-day weekend observance needs to be incorporated (for example, if July 4th is on a Tuesday, perhaps Monday is also a holiday). This would require a little bit of tweaking, but not much.

from pandas.tseries.holiday import Holiday, USMemorialDay,\
        AbstractHolidayCalendar, Nearest, MO, USFederalHolidayCalendar
from pandas.tseries.offsets import DateOffset, CustomBusinessDay
from datetime import datetime
class ExampleCalendar(AbstractHolidayCalendar):
    _rule_table = [
        USMemorialDay,
        Holiday('July 4th', month=7, day=4, observance=Nearest),
        Holiday('Columbus Day', month=10, day=1, 
            offset=DateOffset(weekday=MO(2))),
        ]
cal = ExampleCalendar()
cal.holidays()
print(datetime(2012, 5, 25) + CustomBusinessDay(calendar=cal))
print(datetime(2012, 5, 25) + CustomBusinessDay(calendar=USFederalHolidayCalendar()))

Implementation of holidays and holiday calendars to be used with the
CustomBusinessDay offset.  Also add an Easter holiday for use in
calendars.
class USFederalHolidayCalendar(AbstractHolidayCalendar):

_rule_table = [
Holiday('New Years Day', month=1, day=1, observance=Nearest),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not factor all holidays out as constants like Memorial Day, etc so that they can be re-used by other HolidayCalendars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it for the holidays that shouldn't change observances. If you look at my two calendars I put there, the other holidays have different observances. There could be a default set, but I think caution needs to be taken when doing these types of holidays.

@jreback jreback added this to the 0.14.0 milestone Mar 28, 2014
@jreback
Copy link
Contributor

jreback commented Mar 28, 2014

Can you add a new section after this one on how to use holidays? And in v0.14.0.txt (you can make a subsection if you need or just a bullet)

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects

and then include a short example in the top of this PR.

@jreback jreback added the Docs label Mar 28, 2014
Implementation of holidays and holiday calendars to be used with the
CustomBusinessDay offset.  Also add an Easter holiday for use in
calendars.
@rockg
Copy link
Contributor Author

rockg commented Mar 29, 2014

@jreback sections and examples added.

@jreback
Copy link
Contributor

jreback commented Mar 29, 2014

@rockg a couple of suggestions / comments

  • use rule_table, rather than _rule_table. Its not internal, so not necessary
  • I would have a name attribute for the AbstractHolidayCalendar (settable in the constructor, defautling to the class name).
  • since you end up having lots of calendars (well I DO!, different countries and such), you need a nice way to access them, maybe have a get_calendar(name=....) module level function (which could look up the globals(), or you could also register classes in a module level dict by their names (or both)
  • I think you may need an explicit method to pull in holiday rules from another class (rather than using multiple base classes). maybe calendar1.merge(calendar2)? The reason is that you may need to have a well-defined ordering for adding rules.
  • I have a concept of a holiday type (e.g. July 4th is an observed day, while July 3rd is sometimes a half-day holiday, depending on when it falls, and Veterans days is a Bank holiday). If these get merged (or subclassed together), they need an attribute to differentiate them. Not sure how to handle this, as what this really is is a congolomeration of 3 different calendars. Hmm. maybe too complicated for here.

USMemorialDay,
Holiday('July 4th', month=7, day=4, observance=Nearest),
Holiday('Columbus Day', month=10, day=1,
offset=DateOffset(weekday=MO(2))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this offset can be Week(weekday='Monday')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be Week(weekday=0) or some other integer. I think the example is illustrative enough. I'll put a comment pointing this alternative out.

@rockg
Copy link
Contributor Author

rockg commented Apr 1, 2014

@jreback changes incorporated.

  • Changed to rules rather than rule_table...simpler
  • Name incorporated
  • Created a metaclass that automatically registers the calendars to be callable by name.
  • Created a merge function as well as a HolidayCalendarFactory to create a new class out of existing holiday calendars, rules, or a combination of both
  • Your last bullet to me belongs in the observance function. Namely, when there is a holiday such that July 3rd is a holiday (maybe July 4th on a Friday) then the observance function should return both days. This is what I was envisioning for Nearest4 but I did not have a suitable example.

@cancan101
Copy link
Contributor

Let's say you want to represent the NYSE holiday calendar which has full market closures and half days.how should that be represented with this?

@rockg
Copy link
Contributor Author

rockg commented Apr 1, 2014

I think it depends on the type of holiday (fixed date--July 4th--versus floating date--Memorial Day). I think fixed date half days belong in the observance rule, but floating date half days belong in the offset.

For example, the Friday after Thanksgiving would be an offset DateOffset(TH(4))+DateOffset(1)+Hour(13). And other holidays, like July 3rd would be an observance, for example:

def WeekdayHalfDay(dt):
   if dt.isoweekday() < 6:
       return dt + Hour(13)
   else:
       return None
Holiday('July 3rd Half Day', month=7, day=3, observance=WeekdayHalfDay)

or it can just be in July 4th's observance rule to return a list of dates.

Some changes would need to happen to accept None or a list return from an observance rule, but these are trivial.

To me the time would be the best way to represent such a thing unless you added an indicator and returned a Series/DataFrame with such information but that seems overkill. I'm assuming this would just be in timeseries analysis as I don't believe CustomBusinessDays can handle half days and I don't know what that means in a daily context.

@rockg
Copy link
Contributor Author

rockg commented Apr 1, 2014

@jreback where should the default dates go for the holiday range in AbstractHolidayCalendar.holidays? A configuration file or somewhere else?

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

@rockg I think they can stay in the main file, they are pretty common (though maybe take out the NERC one....). The user will almost certainly override / replace with their own. and just register it. In fact I think pandas should maybe not define ANY calendars (but it makes a nice easy example), so USFederalHolidays is fine. We don't really have a config file anywhere, so would be tricky.

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

    # If holiday falls on Saturday, use following Monday instead;
    # if holiday falls on Sunday, use day thereafter (Monday) instead:
    def next_monday(self):
        new_self = self.copy()
        dow = new_self.data.weekday()
        if (dow == 5): new_self.data = new_self.data + datetime.timedelta(2)
        elif (dow == 6): new_self.data = new_self.data + datetime.timedelta(1)
        return new_self

    # For second holiday of two adjacent ones!
    # If holiday falls on Saturday, use following Monday instead;
    # if holiday falls on Sunday or Monday, use next Tuesday instead
    # (because Monday is already taken by adjacent holiday on the day before):
    def next_monday_or_tuesday(self):
        new_self = self.copy()
        dow = new_self.data.weekday()
        if (dow == 5): new_self.data = new_self.data + datetime.timedelta(2)
        elif (dow == 6): new_self.data = new_self.data + datetime.timedelta(2)
        elif (dow == 0): new_self.data = new_self.data + datetime.timedelta(1)
        return new_self

    # If holiday falls on Saturday or Sunday, use previous Friday instead:
    def previous_friday(self):
        new_self = self.copy()
        dow = new_self.data.weekday()
        if (dow == 5): new_self.data = new_self.data - datetime.timedelta(1)
        elif (dow == 6): new_self.data = new_self.data - datetime.timedelta(2)
        return new_self

    # go to the previous friday if not friday
    def previous_friday_full(self):
        new_self = self.copy()
        dow = new_self.data.weekday()

        if 4-dow > 0:
            new_self.data = new_self.data + datetime.timedelta(dow-3)
        return new_self

    # If holiday falls on Sunday, use day thereafter (Monday) instead:
    def sunday_to_monday(self):
        new_self = self.copy()
        dow = new_self.data.weekday()
        if (dow == 6): new_self.data = new_self.data + datetime.timedelta(1)
        return new_self

    # If holiday falls on Saturday, use day before (Friday) instead;
    # if holiday falls on Sunday, use day thereafter (Monday) instead:
    def nearest_workday(self):
        new_self = self.copy()
        dow = new_self.data.weekday()
        if (dow == 5): new_self.data = new_self.data - datetime.timedelta(1)
        elif (dow == 6): new_self.data = new_self.data + datetime.timedelta(1)
        return new_self

here's some rules that I use (some might be worthwhile to incorporate).

also pls use lowercase for the rules (they are functions)

@rockg
Copy link
Contributor Author

rockg commented Apr 1, 2014

What I was referring to for default dates was the below. This seems wrong to me and seems like it should live in a configuration file or something similar.

        #FIXME: Where should the default limits exist?
        if start is None:
            start = datetime(1970, 1, 1)

        if end is None:
            end = datetime(2030, 12, 31)

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

I would just set those as class variables on the Abstract class, they can always be overriden (you could also create some options for this, ala pd.set_option but might be overkill)

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

I don't think that you should actually pre-generate this, maybe instead cache the generation on demand (a bit harder to track), but I think better

@rockg
Copy link
Contributor Author

rockg commented Apr 1, 2014

I don't believe anything is pre-generated (I assume you are referring to holidays()). I wanted to add caching, but I figured there was already a mechanism in pandas (would need a pointer) or I can just add my own memoization.

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

date caching is really tricky, but you can look in tseries/index.py.

just cache in the class

@rockg
Copy link
Contributor Author

rockg commented Apr 2, 2014

@jreback I'm confused about your previous_friday_full function. It's not doing what I expect based on the comment:

for i in range(7):
    dt1 = dt + datetime.timedelta(i)
    print '%s (%i) -> %s (%i)' % (dt1, dt1.weekday(), previous_friday_full(dt1), previous_friday_full(dt1).weekday())

2012-01-01 00:00:00 (6) -> 2012-01-01 00:00:00 (6)
2012-01-02 00:00:00 (0) -> 2011-12-30 00:00:00 (4)
2012-01-03 00:00:00 (1) -> 2012-01-01 00:00:00 (6)
2012-01-04 00:00:00 (2) -> 2012-01-03 00:00:00 (1)
2012-01-05 00:00:00 (3) -> 2012-01-05 00:00:00 (3)
2012-01-06 00:00:00 (4) -> 2012-01-06 00:00:00 (4)
2012-01-07 00:00:00 (5) -> 2012-01-07 00:00:00 (5)

@jreback
Copy link
Contributor

jreback commented Apr 2, 2014

yeh...this is a weird one, only used for japan holidays. has to so with my concept of full/half days, e.g. certain days the market can be open (at the beginning of the year), but they don't count for holiday purposes (very weird).

would just take that one out....i just copied all of my defs...

Implementation of holidays and holiday calendars to be used with the
CustomBusinessDay offset.  Also add an Easter holiday for use in
calendars.
@rockg
Copy link
Contributor Author

rockg commented Apr 4, 2014

@jreback Think we are there. Please take a look (Travis failures are only TestYahoo).

return 'Holiday: %s (%s)' % (self.name, info)

def dates(self, start_date, end_date, return_name=False):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a coercion step, here start_date = Timestamp(start_date), (and some tests, as I think a string/datetime/timestamp could be passed)

…dd additional

tests for functionality.  Also, add release note.
@rockg
Copy link
Contributor Author

rockg commented Apr 6, 2014

@jreback Maybe now it's okay?

@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

yes...looks good...pls rebase and squash down a bit if you can

ping me when green

@rockg
Copy link
Contributor Author

rockg commented Apr 7, 2014

All right, need some git help. When I try to squash everything together I get the below. Only answers I see talk about conflicts, but I have none that I know of. What determines whether a commit can be squashed or not?

error: could not apply f37742e... Convert map object to list for python 3.

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To check out the original branch and stop rebasing run "git rebase --abort".

Could not apply f37742e... Convert map object to list for python 3.

@immerrr
Copy link
Contributor

immerrr commented Apr 7, 2014

Conflicts occur when incoming commit has one or more modifications that overlap your own, git refuses to choose how to combine those changes and lets you do that for him.

Git will put all non-overlapping changes to staging area and the rest will remain in the working directory wrapped in markup to stand out. If you do git diff, you may notice that diff regions have not one but two columns of + and - characters to the left of them, each will show changes in one of the parent commits.

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

@rockg I rebased and squashed...here it is

if it looks ok...can merge...

jreback@8e3bdfe

Can always do a follow up PR for fixes/corrections.

@rockg
Copy link
Contributor Author

rockg commented Apr 7, 2014

@jreback, that looks great as far as I can tell.

How did you do it?

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

git rebase -i master

git status reports which files are modified by both (the red ones)
fix those, save

git add --all
git rebase --continue

not sure exactly how you got in this state

my workflow is to do a commit
then every once in a while rebase on master

did you do any merges or anything?

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

merged via 8e3bdfe

thanks!

check out docs when they are built (should be shortly)

http://pandas-docs.github.io/pandas-docs-travis/

@jreback jreback closed this Apr 7, 2014
@rockg
Copy link
Contributor Author

rockg commented Apr 7, 2014

What you did makes sense, but that wasn't working. I have no idea how I got in that state. I did have some issues with some commits...conflicts and things that I wasn't expecting and maybe in resolving those I created the issues. Nevertheless, thank you. The hardest part about all this was making git do what I want (maybe 50% of the time) which doesn't seem like it should be the case.

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

git is a beast at times. here is a sample of @cpcloud workflow https://github.com/pydata/pandas/wiki/Git-Workflows

not entirely sure how you go in that state...no biggie though...

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

docs are out

Can you do a follow up PR

I would change the in-line comments (the pound sign), to put above the comment (easier to read).

also can you show an example of actually using the ExampleCalendar, e.g. by defining a CustomBusinessDay index and say adding offsets, to show how it skips weekends and holidays.

thanks

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

@rockg I see you do have an example where you are adding the custom biz day, but maybe make it a tad longer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants