# Let's make a summary table

## By Pontus Nordqvist, <p.nordq@gmail.com>

First, we will need to load our pre-processed data. (The combining of two independant datasets is shown in the pre-processing part)

In [1]:
import os
import pandas as pd

filename = 'CleanHolidayData2016-2022'
data_path = os.path.join(os.getcwd(),f'Data/CleanedData/{filename}.csv')
holiday_df = pd.read_csv(data_path)

Let's group this using three digit ISO code and year to a summary_table

In [2]:
summary_table = holiday_df.groupby(['Alpha-3 code','Year'])

In [3]:
summary_table.head()

Unnamed: 0.1,Unnamed: 0,Date,Holiday,Country,Alpha-2 code,Alpha-3 code,Numeric code,ISO 3166-2,Year
0,0,2016-01-01,Aña Nobo [New Year's Day],aruba,AW,ABW,533,ISO 3166-2:AW,2016
1,1,2016-01-25,Dia Di Betico [Betico Day],aruba,AW,ABW,533,ISO 3166-2:AW,2016
2,2,2016-02-08,Dialuna di Carnaval [Carnaval Monday],aruba,AW,ABW,533,ISO 3166-2:AW,2016
3,3,2016-03-18,Dia di Himno y Bandera [National A...,aruba,AW,ABW,533,ISO 3166-2:AW,2016
4,4,2016-03-25,Bierna Santo [Good Friday],aruba,AW,ABW,533,ISO 3166-2:AW,2016
...,...,...,...,...,...,...,...,...,...
6077,7035,2022-01-01,New Year's Day,south africa,ZA,ZAF,710,ISO 3166-2:ZA,2022
6078,7036,2022-04-15,Good Friday,south africa,ZA,ZAF,710,ISO 3166-2:ZA,2022
6079,7037,2022-04-18,Family Day,south africa,ZA,ZAF,710,ISO 3166-2:ZA,2022
6080,7038,2022-12-16,Day of Reconciliation,south africa,ZA,ZAF,710,ISO 3166-2:ZA,2022


However, we only need this for the unique holidays. Let's make it so

In [4]:
summary_table = summary_table['Holiday'].nunique()

In [5]:
summary_table.head(20)

Alpha-3 code  Year
ABW           2016    11
              2017    11
              2018    11
              2019    11
              2020    11
              2021    11
              2022    11
AGO           2016    16
              2017    16
              2018    18
              2019    15
              2020    16
              2021    18
              2022    16
ARE           2016    13
              2017    11
              2018    12
              2019    11
              2020    10
              2021    11
Name: Holiday, dtype: int64

### Wait, this is not a table. Let's make it a table

In [6]:
summary_table = summary_table.unstack(level=1)

In [7]:
print(summary_table)

Year          2016  2017  2018  2019  2020  2021  2022
Alpha-3 code                                          
ABW             11    11    11    11    11    11    11
AGO             16    16    18    15    16    18    16
ARE             13    11    12    11    10    11    11
ARG             16    17    17    17    17    16    17
AUS              8     8     7     8     9     9     9
...            ...   ...   ...   ...   ...   ...   ...
TUR             10    11    11    11    11    11    11
UKR             10    11    11    11    11    11    11
USA             11    12    11    10    11    13    11
VNM             11    12    11    11    11    11    12
ZAF             15    14    13    14    13    14    14

[68 rows x 7 columns]


Much better! :) Maybe easier to see 

In [8]:
filename = 'SummaryTable2016-2022'
save_path = os.path.join(os.getcwd(), f'{filename}.csv')
summary_table.to_csv(save_path)