# Handle Holidays in Pandas Time Series Analysis
We have downloaded the stock prices for Apple [Ticker name: AAPL] from 01 July 2017 to 21 July 2017, from Yahoo Finance website. 
The data is downloaded as a CSV file. Due to some reason you do not have the Date column in your data. 


# Load the data

In [1]:
import pandas as pd
df = pd.read_csv("AAPL_Holiday.csv")
print(df)

# Observe that the date column is missing. We need that column for time series analysis

          Open        High         Low       Close   Adj Close    Volume
0   144.880005  145.300003  143.100006  143.500000  140.320023  14277800
1   143.690002  144.789993  142.720001  144.089996  140.896942  21569600
2   143.020004  143.500000  142.410004  142.729996  139.567093  24128800
3   142.899994  144.750000  142.899994  144.179993  140.984940  19201700
4   144.110001  145.949997  143.369995  145.059998  141.845444  21090600
5   144.729996  145.850006  144.380005  145.529999  142.305038  19781800
6   145.869995  146.179993  144.820007  145.740005  142.510376  24884500
7   145.500000  148.490005  145.440002  147.770004  144.495407  25199400
8   147.970001  149.330002  147.330002  149.039993  145.737244  20132100
9   148.820007  150.899994  148.570007  149.559998  146.245728  23793500
10  149.199997  150.130005  148.669998  150.080002  146.754242  17868800
11  150.479996  151.419998  149.949997  151.020004  147.673386  20923000
12  151.500000  151.740005  150.190002  150.339996 

# Generate the Datetime Index

In [2]:
dates = pd.date_range(start='07-01-2017', end='07-21-2017', freq='B')     # We wish to generate only business/ working dates and no weekends.
print(dates)

# Observe carefully, that 01 July and 02 July 2017 were weekends, thus these dates were skipped. 
# However, date 04 July as been generated aswell. This is the US National holiday and the stock markets are closed on this date. 
# The stock prices we downloaded will have no entry corresponding to 04 July 2017.
# If we use these dates as indexes to our stocks data, we encounter the following error:
# [ValueError: Length mismatch: Expected axis has 14 elements, new values have 15 elements]

df.set_index(dates, inplace=True)


# WE THEREFORE NEED TO MAKE PROVISION TO EXCLUDE ALL HOLIDAYS IN USA. The exisitng freq='B' will not work.
# We need to define a custom frequency.

DatetimeIndex(['2017-07-03', '2017-07-04', '2017-07-05', '2017-07-06',
               '2017-07-07', '2017-07-10', '2017-07-11', '2017-07-12',
               '2017-07-13', '2017-07-14', '2017-07-17', '2017-07-18',
               '2017-07-19', '2017-07-20', '2017-07-21'],
              dtype='datetime64[ns]', freq='B')


ValueError: Length mismatch: Expected axis has 14 elements, new values have 15 elements

# Lets use the US Holiday calendar for this purpose. 

In [3]:
# Import appropriate libraries
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())            # We are considering only US Holidays here...
print(usb)  

# Generate dates using this custom frequency
dates = pd.date_range(start="07-01-2017", end="07-21-2017", freq=usb)   # Use the custom frequency
print(type(dates))

# Set the Index on the DataFrame
df.set_index(dates, inplace=True)                                       # inplace modifies the same DataFrame.
print(df)

# Inference: Observe that this time, no error was thrown and the holiday of July 04, 2017 was appropriately accounted for. 
#            The US holidays are all jotted in USFederalHolidayCalendar. If you wanted to work on Indian or chinese holidays, 
#            you will have to create your custom classes for the same and provide the appropriate holidays. 

<CustomBusinessDay>
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
                  Open        High         Low       Close   Adj Close  \
2017-07-03  144.880005  145.300003  143.100006  143.500000  140.320023   
2017-07-05  143.690002  144.789993  142.720001  144.089996  140.896942   
2017-07-06  143.020004  143.500000  142.410004  142.729996  139.567093   
2017-07-07  142.899994  144.750000  142.899994  144.179993  140.984940   
2017-07-10  144.110001  145.949997  143.369995  145.059998  141.845444   
2017-07-11  144.729996  145.850006  144.380005  145.529999  142.305038   
2017-07-12  145.869995  146.179993  144.820007  145.740005  142.510376   
2017-07-13  145.500000  148.490005  145.440002  147.770004  144.495407   
2017-07-14  147.970001  149.330002  147.330002  149.039993  145.737244   
2017-07-17  148.820007  150.899994  148.570007  149.559998  146.245728   
2017-07-18  149.199997  150.130005  148.669998  150.080002  146.754242   
2017-07-19  150.479996  151.419998  14

# Creating custom holiday Calendar class
For this you need to visit the pandas source code,available on github: https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/holiday.py

Here search for the implementation of "USFederalHolidayCalendar" class which is the child of "AbstactHoliday" calendar. 
Copy this class as a skeletal template for defining your own Calendar class. 


In [7]:
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday
class myBirthDayCalendar(AbstractHolidayCalendar):
    """
        A list of all birthdays which should be marked as Holidays in my Time Series Analysis date index
    """
    rules = [
        Holiday('Mom Birthday', month=4, day=17),
        Holiday('Dad Birthday', month=4, day=3)       
    ]

    
# Instantitate the above class as a parameter to the CustomBusinessDay class 
obj = CustomBusinessDay(calendar=myBirthDayCalendar())
obj


<CustomBusinessDay>

In [8]:
# Now create new Date index while using the custom frequency (we created in the previous step)
pd.date_range(start="2019-04-01", end="2019-04-30", freq=obj)

# Observe that both the dates 17th April and 3rd April are not considered along with the weekends. 

DatetimeIndex(['2019-04-01', '2019-04-02', '2019-04-04', '2019-04-05',
               '2019-04-08', '2019-04-09', '2019-04-10', '2019-04-11',
               '2019-04-12', '2019-04-15', '2019-04-16', '2019-04-18',
               '2019-04-19', '2019-04-22', '2019-04-23', '2019-04-24',
               '2019-04-25', '2019-04-26', '2019-04-29', '2019-04-30'],
              dtype='datetime64[ns]', freq='C')

# Considering holidays that lie on weekends and which were followed either on the next weekday i.e. Monday or the previous Friday

In [16]:
# Assume that my birthday is on 12th January, 2019. I want to mark this as a holiday. Since this was a weekend i.e. Saturday, 
# I wish to mark the previous Friday i.e. 11th January, 2019 as the holiday. 
# Again copy the class template and make suitable modifications:
# Observe that we added an additional param called 'observance'

from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday
class myBirthDayCalendar(AbstractHolidayCalendar):
    """
        A list of all birthdays which should be marked as Holidays in my Time Series Analysis date index
    """
    rules = [
        Holiday('My Birthday', month=1, day=12,  observance=nearest_workday)
                                               
    ]

    
# Instantitate the above class as a parameter to the CustomBusinessDay class 
obj = CustomBusinessDay(calendar=myBirthDayCalendar())
obj

# Create the date index using the custom freq we created
pd.date_range(start='2019-01-01', end='2019-01-31', freq=obj)

# Observe that 12th and 13th being weekends were ignored. Also since we wanted the nearest workday of 12th [Saturday] i.e. 
# the 11th[Friday] to be OBSERVED as a holiday the 11th of Jan, 2019 was ignored too...
# Had 12th been a Sunday, the nearest workday being 13th, the Monday would have been observed as a holiday. 
# In case you wish to be specific, you can use the following values for the parameter 'observance':
# >>> nearest_workday
# >>> previous_friday
# >>> next_monday


DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
               '2019-01-14', '2019-01-15', '2019-01-16', '2019-01-17',
               '2019-01-18', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-25', '2019-01-28', '2019-01-29',
               '2019-01-30', '2019-01-31'],
              dtype='datetime64[ns]', freq='C')

# Handling weekends in different countries 
In countries like Egypt, the weekends are Friday and Saturday instead of Saturday and Sunday. 
We need to handle these cross country differences within the code. 

In [22]:
# For this we need to customize the CustomBusinessDay Class and specifically mention which days are the working days. 
# So in case of Egypt, the working days will be Sun, Mon, Tue, Wed, Thur
# See the help of CustomBusinessDay via [Shift] + [Tab]. 

obj = CustomBusinessDay(weekmask = 'Sun Mon Tue Wed Thu')
obj

# Create a new date index for Jan 2019
pd.date_range(start='2019-01-01', end='2019-01-31', freq=obj)    

# Observe: According to US calendar, 5th [Sat] and 6th [Sun] were weekends
#          But accroding to Egypt calendar, 4th [Fri] and 5th [Sat] are weekends, while 6th [Sun] is a working day


DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-06',
               '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
               '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
               '2019-01-17', '2019-01-20', '2019-01-21', '2019-01-22',
               '2019-01-23', '2019-01-24', '2019-01-27', '2019-01-28',
               '2019-01-29', '2019-01-30', '2019-01-31'],
              dtype='datetime64[ns]', freq='C')

# Handling different weekends + Holidays in different countries. 

In [26]:
# In Egypt, Fri and Sat are the weekends - Make provision for that. 
# Also assume that say 2nd Jan, 2019 is a national holiday in Egypt. Make provision for it too. 
# Again repeat the previous code step  but include an additional 'Holiday' parameter. 
obj = CustomBusinessDay(weekmask = 'Sun Mon Tue Wed Thu', holidays = ['2019-01-02'])
obj

# Create a new date index for Jan 2019
pd.date_range(start='2019-01-01', end='2019-01-31', freq=obj)  

# Observe that all Sundays are nor working days while all Fridays and Saturdays are skipped as weekends. 
# Also note that 02 Jan, 2019 has been marked as holiday.  

DatetimeIndex(['2019-01-01', '2019-01-03', '2019-01-06', '2019-01-07',
               '2019-01-08', '2019-01-09', '2019-01-10', '2019-01-13',
               '2019-01-14', '2019-01-15', '2019-01-16', '2019-01-17',
               '2019-01-20', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-27', '2019-01-28', '2019-01-29',
               '2019-01-30', '2019-01-31'],
              dtype='datetime64[ns]', freq='C')