# Bank Holiday Dataset

I created a dataset based on the UK bank holiday dates from the official government website. The dataset marks each month from January 2020 to December 2023 with a 0 or 1, indicating whether a bank holiday occurred in that month.

source : https://www.gov.uk/bank-holidays

In [17]:
import pandas as pd

# List of bank holidays by year
bank_holidays = {
    "2023": ["2023-01-02", "2023-05-01", "2023-05-08", "2023-05-29", "2023-08-28", "2023-12-25", "2023-12-26"],
    "2022": ["2022-01-03", "2022-05-02", "2022-06-02", "2022-06-03", "2022-08-29", "2022-09-19", "2022-12-26", "2022-12-27"],
    "2021": ["2021-01-01", "2021-04-02", "2021-05-03", "2021-05-31", "2021-08-30", "2021-12-27", "2021-12-28"],
    "2020": ["2020-01-01", "2020-04-10", "2020-05-08", "2020-05-25", "2020-08-31", "2020-12-25", "2020-12-28"]
}

# Create date range for the start of each month
date_range = pd.date_range(start="2020-01-01", end="2023-12-01", freq="MS")

# Mark months with 0 or 1 for bank holidays
data = []
for date in date_range:
    year = str(date.year)
    if any(pd.to_datetime(holiday) for holiday in bank_holidays.get(year, []) if date.month == pd.to_datetime(holiday).month):
        data.append(1)  # 1 if there is a bank holiday in that month
    else:
        data.append(0)  # 0 if no bank holiday in that month

# Create DataFrame
df = pd.DataFrame({
    "Month": date_range,
    "Bank_Holiday": data
})

df.to_csv('bank_holiday.csv', index=False)

# Display the dataset
df

Unnamed: 0,Month,Bank_Holiday
0,2020-01-01,1
1,2020-02-01,0
2,2020-03-01,0
3,2020-04-01,1
4,2020-05-01,1
5,2020-06-01,0
6,2020-07-01,0
7,2020-08-01,1
8,2020-09-01,0
9,2020-10-01,0
