# The effects of extreme heat on health

The effects of extreme heat have been documented around the world (Office of the Chief Medical Officer of Health, 2022). Climate change will result in more extreme heat events in the future. Extreme weather events like heat waves can have significant economic and health impacts (Field, et al., 2014). Public health and emergency management officials in several Canadian communities are developing interventions to reduce heat-related health risks and prepare for the increase in frequency, duration, and severity of extreme heat events (Water, Air and Climate Change Bureau Healthy Environments and Consumer Safety Branch, 2012). An important adaptation step to protect people from extreme heat events is the development and implementation of a Heat Alert and Response System(s) (HARS) (Water, Air and Climate Change Bureau Healthy Environments and Consumer Safety Branch, 2012). 

The definition of extreme heat varies by region. In New Brunswick, HARS has three alert levels. A heat event is classified based on three factors: intensity, duration, and exposure at night (Office of the Chief Medical Officer of Health, 2022). Level 1 alert is issued when the maximum temperature is above 30 °C for 2 days and the night temperature is more than 18 °C for the first night. Level 1 alert will also be issued if humidex is above 36 °C for 2 days. A level 2 alert is issued when all conditions of Level 1 are met, and the humidex is between 40 °C to 44 °C. Level 3 alert is issued when all conditions of Level 1 are met, and the humidex is 45 °C or above.

To Analyse the effects of Extreme temperatures on New Brunswickers’ health, we will use climate and health data. We will use open data from climate Canada to find extreme temperature days and put a correct HARS alert label. The Climate data that we will use is the hourly data between 2017-2022. We will use variables such as max-temperature, night-min-temperature, humidex, and consecutive days in this part to define different HARS Levels. Once labelled climate data is connected with the health data, different statistical models can be used to identify the relationship between extreme heat and health/hospitalization. Some statical methods such as generalized linear models (Son, Bell, & Lee, 2014), distributed lag non-linear models (Sun, et al., 2021), time series analysis (Clemens, et al., 2022), and Poisson log-linear model  (Chen, Sarnat, Grundstein, Winquist, & Chang, 2017) can be performed. 

Process of climate data cleaning/pre-processing:
Temperature, humidex, and date columns are required for calculating extreme heat events. Climate Canada has the daily climate data available. However, hourly data is flexible to create new variables such as night temperature or more. Thus, hourly data was chosen and converted to the daily dataset. Because there are 1233614 records in 124 files, it took a little bit of time to download the hourly data files initially. All 124 files are then merged to create one dataset and used for analysis. Many variables were created and removed from the dataset to define extreme heat events/days based on New Brunswick HARS levels. 

Variables used from the climate dataset in creating HARS labels:

* STATION_NAME – City/Place name
* DATE_FORMATTED – Date of the temperature recorded.
* TEMP_MAX – Maximum temperature of the day.
* DAY_MAX_TEMP – Maximum temperature of the day during day hours.
* NIGHT_MAX_TEMP – Maximum temperature of the day during night hours.
* TEMP_MIN – Minimum temperature of the day.
* DAY_MIN_TEMP – Minimum temperature of the day during day hours.
* NIGHT_MIN_TEMP – Minimum temperature of the day during night hours.
* TEMP_MEAN – Average temperature of the day.
* DAY_MEAN_TEMP – Average temperature of the day during day hours.
* NIGHT_MEAN_TEMP – Average temperature of the day during night hours.
* HUMIDEX_MAX – Maximum humidex of the day.
* DAY_MAX_HUMIDEX – Maximum humidex of the day during day hours.
* NIGHT_MAX_HUMIDEX – Maximum humidex of the day during night hours.
* HUMIDEX_MIN – Minimum humidex of the day.
* DAY_MIN_HUMIDEX – Minimum humidex of the day during day hours.
* NIGHT_MIN_HUMIDEX – Minimum humidex of the day during night hours.
* HUMIDEX_MEAN – Average humidex of the day.
* DAY_MEAN_HUMIDEX – Average humidex of the day during day hours.
* NIGHT_MEAN_HUMIDEX – Average humidex of the day during night hours.
* TEMP_30 – Day when the temperature is above or equal to 30 degrees.
* TEMP_40 – Day when the temperature is above or equal to 40 degrees.
* NIGHT_18 – Day when the minimum night temperature is above or equal to 18 degrees.
* HUMIDEX_36 – Day when Humidex is above or equal to 36 degrees.
* HUMIDEX_40 – Day when Humidex is above or equal to 40 degrees.
* HUMIDEX_45 – Day when Humidex is above or equal to 45 degrees.
* GROUP_ALERT – Consecutive days in Level 1 and above alerts.
* LEVEL_1_ALERT – Day when conditions reach Level 1 alert.
* LEVEL_2_ALERT – Day when conditions reach Level 2 alert.
* LEVEL_3_ALERT – Day when conditions reach Level 3 alert.


## Analysis

In [None]:
import pandas as pd
import numpy as np
import math

In [None]:
#Import data 

dataset_nb_raw = pd.read_csv("data/climate-hourly-nb.csv", sep=',', encoding='unicode_escape', dtype={"x": "string","y": "string","STATION_NAME": "string","CLIMATE_IDENTIFIER": "string","ID": "string","LOCAL_DATE": "string","PROVINCE_CODE": "string","LOCAL_YEAR": "string","LOCAL_MONTH": "string","LOCAL_DAY": "string","LOCAL_HOUR": "string","TEMP": "string","TEMP_FLAG": "string","DEW_POINT_TEMP": "string","DEW_POINT_TEMP_FLAG": "string","HUMIDEX": "string","HUMIDEX_FLAG": "string","PRECIP_AMOUNT": "string","PRECIP_AMOUNT_FLAG": "string","RELATIVE_HUMIDITY": "string","RELATIVE_HUMIDITY_FLAG": "string","STATION_PRESSURE": "string","STATION_PRESSURE_FLAG": "string","VISIBILITY": "string","VISIBILITY_FLAG": "string","WINDCHILL": "string","WINDCHILL_FLAG": "string","WIND_DIRECTION": "string","WIND_DIRECTION_FLAG": "string","WIND_SPEED": "string","WIND_SPEED_FLAG": "string"})

dataset_nb = dataset_nb_raw.copy()
dataset_nb.head(5)

In [None]:
#Check for null values
dataset_nb.isna().sum()

### Exploratory Data Analysis

In [None]:
dataset_nb.shape

In [None]:
dataset_nb.info()

### Data Cleaning

In [None]:
#Create a copy of the dataset

dataset_nb_skim=dataset_nb.copy()

#Drop columns that we will not use
dataset_nb_skim = dataset_nb_skim.drop(['ID', 'PROVINCE_CODE', 'LOCAL_YEAR','LOCAL_MONTH','LOCAL_DAY','LOCAL_HOUR','PRECIP_AMOUNT','PRECIP_AMOUNT_FLAG','STATION_PRESSURE','STATION_PRESSURE_FLAG', 'VISIBILITY', 'VISIBILITY_FLAG', 'WINDCHILL', 'WINDCHILL_FLAG', 'WIND_DIRECTION', 'WIND_DIRECTION_FLAG', 'WIND_SPEED', 'WIND_SPEED_FLAG'], axis=1)

dataset_nb_skim.info()


In [None]:
#Noticed that there are two duplicate stations (), we will need to remove one set of data.
#MONCTON / GREATER MONCTON ROMEO LEBLANC INTL A - 8103202  and MONCTON/GREATER MONCTON ROMEO LEBLANC INTL A - 8103201
#BATHURST A	- 8100506 and BATHURST A - 8100505

dataset_nb_skim.drop(dataset_nb_skim[dataset_nb_skim.CLIMATE_IDENTIFIER == "8103201"].index, inplace=True)
dataset_nb_skim.drop(dataset_nb_skim[dataset_nb_skim.CLIMATE_IDENTIFIER == "8100506"].index, inplace=True)

In [None]:
dataset_nb_skim[dataset_nb_skim.CLIMATE_IDENTIFIER == "8103201"].head(5)

In [None]:
#Convert some columns to numeric
dataset_nb_skim["TEMP"]=pd.to_numeric(dataset_nb_skim["TEMP"], errors='coerce')
dataset_nb_skim["HUMIDEX"]=pd.to_numeric(dataset_nb_skim["HUMIDEX"], errors='coerce')

#### Create a subset of data with temperature filters

In [None]:
#Sort the data by date
dataset_nb_skim.sort_values(by='LOCAL_DATE', inplace = True) 
dataset_nb_skim.head(5)

In [None]:
from datetime import datetime
from datetime import date

In [None]:
# Get date only from LOCAL_DATE column

def get_date_from_string(date_string,result_datetime):
    date_format = '%Y-%m-%d %H:%M:%S'
    if(result_datetime == "date"):
        datetimeVal = datetime.strptime(date_string, date_format).date()
    elif(result_datetime == "time"):
        datetimeVal = datetime.strptime(date_string, date_format).time()
    return datetimeVal

In [None]:
#Define day time 

DayStart = "06:00:00"
DayEnd = "18:00:00"

dataset_nb_skim["IS_DAY"] = dataset_nb_skim["LOCAL_DATE"].apply(lambda x: True if get_date_from_string(x,"time") >= datetime.strptime(DayStart, '%H:%M:%S').time() and get_date_from_string(x,"time") < datetime.strptime(DayEnd, '%H:%M:%S').time() else False)

dataset_nb_skim["DATE_FORMATTED"] = dataset_nb_skim["LOCAL_DATE"].apply(lambda x: get_date_from_string(x,"date"))

In [None]:
#Adjust Column position
dataset_nb_skim =dataset_nb_skim[["x","y","STATION_NAME","CLIMATE_IDENTIFIER","LOCAL_DATE","DATE_FORMATTED","IS_DAY","TEMP","TEMP_FLAG","DEW_POINT_TEMP","DEW_POINT_TEMP_FLAG","HUMIDEX","HUMIDEX_FLAG","RELATIVE_HUMIDITY","RELATIVE_HUMIDITY_FLAG"]]
dataset_nb_skim.head(5)

In [None]:
dataset_nb_skim.shape

In [None]:
dataset_nb_skim.isna().sum()

#### Getting min, mean, max values for Temperature

In [None]:
Min_Max_Mean_Temp = dataset_nb_skim.groupby(['STATION_NAME','DATE_FORMATTED','IS_DAY'])['TEMP'].aggregate(['min','max','mean'])
Min_Max_Mean_Temp.head(5)
#https://sparkbyexamples.com/pandas/pandas-groupby-multiple-columns/

In [None]:
Min_Max_Mean_Temp = Min_Max_Mean_Temp.pivot_table(index=["STATION_NAME","DATE_FORMATTED"], columns=['IS_DAY'],values=['min','max','mean'])

#Flatten the Indexes
Min_Max_Mean_Temp = pd.DataFrame(Min_Max_Mean_Temp.to_records())

#Fill NA
Min_Max_Mean_Temp = Min_Max_Mean_Temp.fillna("")

#Rename Columns
Min_Max_Mean_Temp.rename(columns={"('max', False)": 'NIGHT_MAX_TEMP', "('min', False)": 'NIGHT_MIN_TEMP', "('max', True)": 'DAY_MAX_TEMP', "('min', True)": 'DAY_MIN_TEMP', "('mean', True)": 'DAY_MEAN_TEMP', "('mean', False)": 'NIGHT_MEAN_TEMP'}, inplace=True)

In [None]:
#Type Numeric

cols = ["NIGHT_MAX_TEMP","DAY_MAX_TEMP","NIGHT_MEAN_TEMP","DAY_MEAN_TEMP","NIGHT_MIN_TEMP","DAY_MIN_TEMP"]
Min_Max_Mean_Temp[cols] = Min_Max_Mean_Temp[cols].apply(pd.to_numeric, errors='coerce', axis=1)

#Round to 1 decimal point

Min_Max_Mean_Temp = Min_Max_Mean_Temp.round({"NIGHT_MAX_TEMP": 1, "DAY_MAX_TEMP": 1, "NIGHT_MEAN_TEMP": 1, "DAY_MEAN_TEMP": 1, "NIGHT_MIN_TEMP": 1, "DAY_MIN_TEMP": 1})

#### Create new columns for Temperature MIN, MAX, MEAN

In [None]:
Min_Max_Mean_Temp["TEMP_MIN"] = Min_Max_Mean_Temp[['NIGHT_MIN_TEMP','DAY_MIN_TEMP']].min(axis=1)
Min_Max_Mean_Temp["TEMP_MAX"] = Min_Max_Mean_Temp[['NIGHT_MAX_TEMP','DAY_MAX_TEMP']].max(axis=1)
Min_Max_Mean_Temp["TEMP_MEAN"] = Min_Max_Mean_Temp[['NIGHT_MEAN_TEMP','DAY_MEAN_TEMP']].mean(axis=1)

#Change columns position
Min_Max_Mean_Temp =Min_Max_Mean_Temp[[ "STATION_NAME", "DATE_FORMATTED", "TEMP_MAX", "DAY_MAX_TEMP", "NIGHT_MAX_TEMP", "TEMP_MIN", "DAY_MIN_TEMP", "NIGHT_MIN_TEMP", "TEMP_MEAN", "DAY_MEAN_TEMP", "NIGHT_MEAN_TEMP"]]

#Fill NA
Min_Max_Mean_Temp = Min_Max_Mean_Temp.fillna("")
Min_Max_Mean_Temp.head(5)

In [None]:
Min_Max_Mean_Temp.shape

#### Getting min, mean, max values for Humidex

In [None]:
Min_Max_Mean_Humidex = dataset_nb_skim.groupby(['STATION_NAME','DATE_FORMATTED','IS_DAY'])['HUMIDEX'].aggregate(['min','max','mean'])
Min_Max_Mean_Humidex

In [None]:
Min_Max_Mean_Humidex = Min_Max_Mean_Humidex.pivot_table(index=["STATION_NAME","DATE_FORMATTED"], columns=['IS_DAY'],values=['min','max','mean'])

#Flatten the Indexes
Min_Max_Mean_Humidex = pd.DataFrame(Min_Max_Mean_Humidex.to_records())

#Rename Columns
Min_Max_Mean_Humidex.rename(columns={"('max', False)": 'NIGHT_MAX_HUMIDEX', "('min', False)": 'NIGHT_MIN_HUMIDEX', "('max', True)": 'DAY_MAX_HUMIDEX', "('min', True)": 'DAY_MIN_HUMIDEX', "('mean', True)": 'DAY_MEAN_HUMIDEX', "('mean', False)": 'NIGHT_MEAN_HUMIDEX'}, inplace=True)

#Type Numeric
cols = ["NIGHT_MAX_HUMIDEX","DAY_MAX_HUMIDEX","NIGHT_MEAN_HUMIDEX","DAY_MEAN_HUMIDEX","NIGHT_MIN_HUMIDEX","DAY_MIN_HUMIDEX"]
Min_Max_Mean_Humidex[cols] = Min_Max_Mean_Humidex[cols].apply(pd.to_numeric, errors='coerce', axis=1)

#Round to 1 decimal point
Min_Max_Mean_Humidex = Min_Max_Mean_Humidex.round({"NIGHT_MAX_HUMIDEX": 1, "DAY_MAX_HUMIDEX": 1, "NIGHT_MEAN_HUMIDEX": 1, "DAY_MEAN_HUMIDEX": 1, "NIGHT_MIN_HUMIDEX": 1, "DAY_MIN_HUMIDEX": 1})

#### Create new columns for Humidex MIN, MAX, MEAN

In [None]:
Min_Max_Mean_Humidex["HUMIDEX_MIN"] = Min_Max_Mean_Humidex[['NIGHT_MIN_HUMIDEX','DAY_MIN_HUMIDEX']].min(axis=1)
Min_Max_Mean_Humidex["HUMIDEX_MAX"] = Min_Max_Mean_Humidex[['NIGHT_MAX_HUMIDEX','DAY_MAX_HUMIDEX']].max(axis=1)
Min_Max_Mean_Humidex["HUMIDEX_MEAN"] = Min_Max_Mean_Humidex[['NIGHT_MEAN_HUMIDEX','DAY_MEAN_HUMIDEX']].mean(axis=1)

#Change columns position
Min_Max_Mean_Humidex =Min_Max_Mean_Humidex[[ "STATION_NAME", "DATE_FORMATTED", "HUMIDEX_MAX", "DAY_MAX_HUMIDEX", "NIGHT_MAX_HUMIDEX", "HUMIDEX_MIN", "DAY_MIN_HUMIDEX", "NIGHT_MIN_HUMIDEX", "HUMIDEX_MEAN", "DAY_MEAN_HUMIDEX", "NIGHT_MEAN_HUMIDEX"]]

#Fill NA
Min_Max_Mean_Humidex = Min_Max_Mean_Humidex.fillna("")
Min_Max_Mean_Humidex.head(5)

In [None]:
Min_Max_Mean_Humidex.shape

#### Combine TEMP and HUMIDEX

In [None]:
# Temp_Humidex is the main dataframe that will hold daily records with new variables.

Temp_Humidex = pd.merge(Min_Max_Mean_Temp, Min_Max_Mean_Humidex,  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])

Temp_Humidex = Temp_Humidex.fillna("")

Temp_Humidex.shape

In [None]:
cols = ["TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX"]
Temp_Humidex[cols] = Temp_Humidex[cols].apply(pd.to_numeric, errors='coerce', axis=1)

### Assign Heat labels

* L1 -> "Tmax >=30°C for 2+ days and Tmin* >=18°C (1st night only) Or Humidex >= 36 for 2+ days"
* L2 -> "HARS level 1 criteria is met plus either of the 2 days reaches Humidex 40 – 44"
* L3 -> "HARS level 1 criteria is met plus either of the 2 days reaches Humidex >= 45"

#### Level 1 label
##### Level 1a label

First option to define Level 1 alert is if the maximum temperature is above 30 degrees for 2 days and night temperature is more than 18 degrees for the first night. 

In [None]:
temperature_l1_day = 30

condition1L1a = Temp_Humidex['TEMP_MAX'] >= temperature_l1_day

Temp_Humidex_L1a = Temp_Humidex[(condition1L1a)].copy()
Temp_Humidex_L1a.reset_index(drop=True, inplace=True)
Temp_Humidex_L1a.head(5)

#### Consecutive Dates check

In [None]:

day_count_check_consec = 0
status_check_consec = "Hot"
prev_date_check_consec = "2016-12-31"
format_date_check_consec = '%Y-%m-%d'
prev_date_check_consec = datetime.strptime(prev_date_check_consec, format_date_check_consec).date()
station_name_check_consec = ""

def check_consecutive(station, date_recorded,duration):

    global day_count_check_consec
    global prev_date_check_consec
    global format_date_check_consec
    global station_name_check_consec
    global status_check_consec

    if(station != station_name_check_consec):
        day_count_check_consec = day_count_check_consec+1
        status_check_consec = "Group_" +str(day_count_check_consec)
        station_name_check_consec = station
        prev_date_check_consec = date_recorded
    else:
        days_diff = date_recorded - prev_date_check_consec
        prev_date_check_consec = date_recorded
        if(days_diff.days == 1):
            status_check_consec = "Group_" +str(day_count_check_consec)
        else:
            day_count_check_consec = day_count_check_consec+1
            status_check_consec = "Group_"+str(day_count_check_consec)
            
    return status_check_consec
            

In [None]:
#How many consecutive days to check 
Condition_1_duration_l1 = 2

Temp_Humidex_L1a["consecutive"] = Temp_Humidex_L1a.apply(lambda x: check_consecutive(x["STATION_NAME"],x["DATE_FORMATTED"],Condition_1_duration_l1),axis=1)

#Count of consecutive days
Temp_Humidex_L1a["consecutive_new"] = Temp_Humidex_L1a.groupby('consecutive')['consecutive'].transform('count')

Temp_Humidex_L1a.head(5)

##### Special condition for one night min temperature

In [None]:
# Set the minimum temperature and other variables for first night to check for L1 alert 

night_one_min_temperature = 18
set_duration_required = 1
consecutive_days_limit = 2
level_message="Level 1"

In [None]:
set_number = ""
set_duration_reset = 1
temperature_met = False

def check_night_temp(temperature_to_match, set_number_l1, number_of_sets):
    global set_number
    global set_duration_required
    global set_duration_reset
    global temperature_met
    
    level_message_update = level_message
    
    if(set_number != set_number_l1):
        set_number = set_number_l1
        set_duration_reset = set_duration_required
        temperature_met = False

    if(number_of_sets >=consecutive_days_limit):
        if(temperature_to_match >= night_one_min_temperature and set_duration_reset < number_of_sets):
            set_duration_reset = set_duration_reset+1
            temperature_met = True
            level_message_update = level_message
        elif(temperature_met == True):
            set_duration_reset = set_duration_reset+1
            level_message_update = level_message
        else:
            set_duration_reset = set_duration_reset+1
            level_message_update = ""
    else:
        level_message_update = ""

    return level_message_update


In [None]:
#Check the night temperature condition and create L1a alreat

Temp_Humidex_L1a["LEVEL_1a_ALERT"] = Temp_Humidex_L1a.apply(lambda x: check_night_temp(x["NIGHT_MIN_TEMP"],x["consecutive"],x["consecutive_new"]),axis=1)

#Merge L1a to Main dataframe
Temp_Humidex = pd.merge(Temp_Humidex, Temp_Humidex_L1a[['STATION_NAME','DATE_FORMATTED','LEVEL_1a_ALERT']],  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])

##### Level 1b label

Second method to define Level 1 alert is by checking if Humidex is above 36 for 2 days

In [None]:
# Variable for l1 humidex value
temperature_l1_humidex = 36
condition1L1b = Temp_Humidex['HUMIDEX_MAX'] >= temperature_l1_humidex

Temp_Humidex_L1b = Temp_Humidex[(condition1L1b)].copy()
Temp_Humidex_L1b.reset_index(drop=True, inplace=True)
Temp_Humidex_L1b.head(5)

In [None]:
# Create L1b alert column

Condition_L2_duration = 2

Temp_Humidex_L1b["consecutive"] = Temp_Humidex_L1b.apply(lambda x: check_consecutive(x["STATION_NAME"],x["DATE_FORMATTED"],Condition_L2_duration),axis=1)

Temp_Humidex_L1b["consecutive_new"] = Temp_Humidex_L1b.groupby('consecutive')['consecutive'].transform('count')

Temp_Humidex_L1b["LEVEL_1b_ALERT"] = Temp_Humidex_L1b.apply(lambda x: "Level 1" if x["consecutive_new"] >= Condition_L2_duration else "",axis=1)

Temp_Humidex_L1b.head(5)

In [None]:
Temp_Humidex = pd.merge(Temp_Humidex, Temp_Humidex_L1b[['STATION_NAME','DATE_FORMATTED','LEVEL_1b_ALERT']],  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])
Temp_Humidex.head(5)

In [None]:
#Merge L1A and L1B

Temp_Humidex = Temp_Humidex.fillna("")

Temp_Humidex["LEVEL_1_ALERT"] = Temp_Humidex.apply(lambda x: "Level 1" if x["LEVEL_1a_ALERT"] == "Level 1" or x["LEVEL_1b_ALERT"] == "Level 1" else "",axis=1)

Temp_Humidex = Temp_Humidex.drop(["LEVEL_1a_ALERT","LEVEL_1b_ALERT"], axis=1)


#Create a Level 1 Dataset Filter

condition1L1 = Temp_Humidex['LEVEL_1_ALERT'] == "Level 1"

Temp_Humidex_L1 = Temp_Humidex[(condition1L1)].copy()
Temp_Humidex_L1.head(5)

In [None]:
#Create Groups for L1 Alerts. These groups are useful when we count how many (consecutive days)set of L1 or L2 or L3 alerts were assigned to the dataframe. 

Temp_Humidex_L1["GROUP_ALERT"] = Temp_Humidex_L1.apply(lambda x: check_consecutive(x["STATION_NAME"],x["DATE_FORMATTED"],2),axis=1) 

Temp_Humidex_L1["COUNT_SET"] = Temp_Humidex_L1.apply(lambda x: math.floor(Temp_Humidex_L1['GROUP_ALERT'].value_counts()[x["GROUP_ALERT"]]/Condition_1_duration_l1),axis=1)

#Merge new column to Temp Humidex
Temp_Humidex = pd.merge(Temp_Humidex, Temp_Humidex_L1[['STATION_NAME','DATE_FORMATTED','GROUP_ALERT','COUNT_SET']],  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])
Temp_Humidex.head(5)


In [None]:
#Temp_Humidex.drop_duplicates(inplace=True)

#### Level 2 label

Level 2 alert is defined when all conditions of L1 meet, and the humidex is between 40 to 44.

In [None]:
temperature_range_l2 = [40,45]

condition1L2 = Temp_Humidex_L1['HUMIDEX_MAX'] < temperature_range_l2[1]
condition2L2 = Temp_Humidex_L1['HUMIDEX_MAX'] >= temperature_range_l2[0]

Temp_Humidex_L2 = Temp_Humidex_L1[(condition1L2) & (condition2L2)].copy()
Temp_Humidex_L2.reset_index(drop=True, inplace=True)

Temp_Humidex_L2.head(5)

In [None]:
Condition_L2_duration_days = 2

Temp_Humidex_L2["consecutive"] = Temp_Humidex_L2.apply(lambda x: check_consecutive(x["STATION_NAME"],x["DATE_FORMATTED"],Condition_L2_duration_days),axis=1)
Temp_Humidex_L2["consecutive_new"] = Temp_Humidex_L2.groupby('consecutive')['consecutive'].transform('count')

#### Check if one of Level 1 days have Level 2 or 3 criteria

In [None]:
temperature_to_check = temperature_range_l2
set_duration_required = 1
consecutive_days_limit = 1
level_message="Level 2"

set_number = ""
set_duration_reset = 1

def check_temp_instance(temperature_to_match, set_number_l1, number_of_sets):
    
    if isinstance(temperature_to_check, list): 
        temp_condition = (temperature_to_match >= temperature_to_check[0] and temperature_to_match <= temperature_to_check[1])
    else:
        temp_condition = temperature_to_match >= temperature_to_check

    global set_number
    global set_duration_required
    global set_duration_reset

    if(set_number != set_number_l1):
        set_number = set_number_l1
        set_duration_reset = set_duration_required

    if(number_of_sets >=consecutive_days_limit and temp_condition and set_duration_reset <= set_duration_required):
        set_duration_reset = set_duration_reset+1
        return level_message
    elif(number_of_sets >=consecutive_days_limit and set_duration_reset > set_duration_required):
        return level_message
    else:
        return ""


In [None]:
#Add Level 2 alert

Temp_Humidex_L2["LEVEL_2_ALERT"] = Temp_Humidex_L2.apply(lambda x: check_temp_instance(x["HUMIDEX_MAX"],x["consecutive"],x["consecutive_new"]),axis=1)

Temp_Humidex = pd.merge(Temp_Humidex, Temp_Humidex_L2[['STATION_NAME','DATE_FORMATTED','LEVEL_2_ALERT']],  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])

# Convert the data type to numeric
cols = ["TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX"]
Temp_Humidex[cols] = Temp_Humidex[cols].apply(pd.to_numeric, errors='coerce', axis=1)

#### Level 3 label

Level 3 alert is defined when all conditions of L1 meet, and the humidex is 45 or above.

In [None]:
"""Create a Level 1 Dataset Filter Again"""

condition1L1 = Temp_Humidex['LEVEL_1_ALERT'] == "Level 1"

Temp_Humidex_L1 = Temp_Humidex[(condition1L1)].copy()

In [None]:
temperature_range_l3 = 45
condition1L3 = Temp_Humidex_L1['HUMIDEX_MAX'] >= temperature_range_l3

Temp_Humidex_L3 = Temp_Humidex_L1[(condition1L3)].copy()
Temp_Humidex_L3.reset_index(drop=True, inplace=True)
Temp_Humidex_L3.head(5)

In [None]:
Condition_L3_duration_days = 2

Temp_Humidex_L3["consecutive"] = Temp_Humidex_L3.apply(lambda x: check_consecutive(x["STATION_NAME"],x["DATE_FORMATTED"],Condition_L3_duration_days),axis=1)

Temp_Humidex_L3["consecutive_new"] = Temp_Humidex_L3.groupby('consecutive')['consecutive'].transform('count')

In [None]:
temperature_to_check = temperature_range_l3
level_message="Level 3"

Temp_Humidex_L3["LEVEL_3_ALERT"] = Temp_Humidex_L3.apply(lambda x: check_temp_instance(x["HUMIDEX_MAX"],x["consecutive"],x["consecutive_new"]),axis=1)

Temp_Humidex = pd.merge(Temp_Humidex, Temp_Humidex_L3[['STATION_NAME','DATE_FORMATTED','LEVEL_3_ALERT']],  how='left', left_on=['STATION_NAME','DATE_FORMATTED'], right_on = ['STATION_NAME','DATE_FORMATTED'])

cols = ["TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX"]
Temp_Humidex[cols] = Temp_Humidex[cols].apply(pd.to_numeric, errors='coerce', axis=1)

Temp_Humidex.head(5)

In [None]:
Temp_Humidex.describe(exclude='object')

In [None]:
Temp_Humidex.shape

In [None]:
Temp_Humidex.info()

### Unique Values

In [None]:
def unique_values__or_count(listOfColumns,options,dataset):
    for x in range(0, len(listOfColumns), 1):
        if(options=="unique"):
            unique_values_str = dataset[listOfColumns[x]].unique()
            print("unique_values " + listOfColumns[x])
            print(unique_values_str)
            print("------------------------")
        if(options=="count"):
            values_distribution = dataset[listOfColumns[x]].value_counts()
            print("-----------"+listOfColumns[x] +"------------")
            print(values_distribution)
            print("-----------------------")

In [None]:
unique_values__or_count(['STATION_NAME','LEVEL_1_ALERT','LEVEL_2_ALERT','LEVEL_3_ALERT'],"count",Temp_Humidex)

#### Extra columns for single day heat information

In [None]:
# These columns can be useful for visualizations. These are also useful to define if a day had extreme heat but didn't meet the L1, L2, L3 criteria.

Temp_Humidex["TEMP_30"] = Temp_Humidex.apply(lambda x: "Yes" if x["TEMP_MAX"] >= 30 else "No",axis=1)
Temp_Humidex["TEMP_40"] = Temp_Humidex.apply(lambda x: "Yes" if x["TEMP_MAX"] >= 40 else "No",axis=1)
Temp_Humidex["NIGHT_18"] = Temp_Humidex.apply(lambda x: "Yes" if x["NIGHT_MIN_TEMP"] >= 18 else "No",axis=1)
Temp_Humidex["HUMIDEX_36"] = Temp_Humidex.apply(lambda x: "Yes" if x["HUMIDEX_MAX"] >= 36 else "No",axis=1)
Temp_Humidex["HUMIDEX_40"] = Temp_Humidex.apply(lambda x: "Yes" if x["HUMIDEX_MAX"] >= 40 else "No",axis=1)
Temp_Humidex["HUMIDEX_45"] = Temp_Humidex.apply(lambda x: "Yes" if x["HUMIDEX_MAX"] >= 45 else "No",axis=1)

In [None]:
Temp_Humidex = Temp_Humidex.fillna("")

In [None]:
unique_values__or_count(['TEMP_30','TEMP_40','NIGHT_18','HUMIDEX_36','HUMIDEX_40','HUMIDEX_45'],"count",Temp_Humidex)

#### Filtering Winter Months

Similar filter can also be on 'dataset_nb' dataset in the beginning, this way processing of the data will be faster.

In [None]:
Temp_Humidex["IS_SUMMER"] = Temp_Humidex.apply(lambda x: "Yes" if (x["DATE_FORMATTED"].month >= 4) and x["DATE_FORMATTED"].month < 9 else "No",axis=1)

condition_winter_start = Temp_Humidex['IS_SUMMER'] == "Yes"

Temp_Humidex = Temp_Humidex[(condition_winter_start)]

Temp_Humidex = Temp_Humidex.drop(["IS_SUMMER"], axis=1)

#Define Year column
Temp_Humidex["YEAR"] = pd.DatetimeIndex(Temp_Humidex["DATE_FORMATTED"]).year

In [None]:
# To Numeric
cols = ["TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX"]
Temp_Humidex[cols] = Temp_Humidex[cols].apply(pd.to_numeric, errors='coerce', axis=1)

#Adjust Column position
Temp_Humidex =Temp_Humidex[["STATION_NAME","DATE_FORMATTED","YEAR","TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX","TEMP_30","TEMP_40","NIGHT_18","HUMIDEX_36","HUMIDEX_40","HUMIDEX_45","GROUP_ALERT","COUNT_SET","LEVEL_1_ALERT","LEVEL_2_ALERT","LEVEL_3_ALERT"]]
Temp_Humidex.head(5)

Note: Dataframe 'Temp_Humidex' is missing some columns from the dataframe we loaded on top ('dataset_nb'). If you need some of those columns such as x and y cordinates for visualising data on a map, you can always link them from the 'dataset_nb' dataframe. 

### Visualizations

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
#from statsmodels.graphics.mosaicplot import mosaic

In [None]:
def count_plots_univariate(listOfColumns,legend_col,plotType,labely,labelx,dataset):
    totalCols=3
    totalRows=math.ceil(len(listOfColumns)/totalCols)
    widthForEachGraph=3
    fig = plt.figure(figsize=((totalCols+3)*3,(totalRows+1)*3))
    plt.subplots_adjust(wspace=0.5, hspace=0.5)
    sns.color_palette("Paired")
    for x in range(0, len(listOfColumns),1):
        globals()[f"ax_count_plots_{x}"] = fig.add_subplot(totalRows,totalCols,(x+1))
        globals()[f"ax_count_plots_{x}"].set_title(listOfColumns[x].upper()+" Count")
        globals()[f"ax_count_plots_{x}"].set_xlabel(labelx)
        globals()[f"ax_count_plots_{x}"].set_ylabel(labely)
        if(plotType=="countplot"):
            sns.countplot(y=listOfColumns[x],  data= dataset,hue=legend_col, ax=globals()[f"ax_count_plots_{x}"], palette = 'Paired')
            globals()[f"ax_count_plots_{x}"].legend(fontsize="8",bbox_to_anchor=(1.1,1))
        if(plotType=="histogram"):
            globals()[f"ax_count_plots_{x}"].hist(dataset[listOfColumns[x]], bins=20)
        
    plt.show()
        

In [None]:
#Visualize Level 1 days

L1_Visual = Temp_Humidex[(Temp_Humidex['LEVEL_1_ALERT'] == "Level 1")].copy()

In [None]:
count_plots_univariate(['LEVEL_1_ALERT'],"STATION_NAME","countplot","Count","Heat Alert",L1_Visual)

In [None]:
#Visualize Level 2 days

L2_Visual = Temp_Humidex[(Temp_Humidex['LEVEL_2_ALERT'] == "Level 2")].copy()

In [None]:
count_plots_univariate(['LEVEL_2_ALERT'],"STATION_NAME","countplot","Count","Heat Alert",L2_Visual)

In [None]:
#Visualize Level 3 days

L3_Visual = Temp_Humidex[(Temp_Humidex['LEVEL_3_ALERT'] == "Level 3")].copy()

In [None]:
count_plots_univariate(['LEVEL_3_ALERT'],"STATION_NAME","countplot","Count","Heat Alert",L3_Visual)

#### Line Charts

In [None]:
L1_Visual_year1 = Temp_Humidex[["STATION_NAME","DATE_FORMATTED","YEAR","TEMP_MAX"]].copy()
L1_Visual_year1 = L1_Visual_year1[(L1_Visual_year1['YEAR'] == 2017)]


In [None]:

#This is just an example of a line graph, visualization can be better
def linechart_of_categories(dataset,group_by,time_column,value_column):

    dataset = dataset.set_index(time_column)
    dataset = dataset.groupby([group_by])
    # extract keys from groups
    keys = dataset.groups.keys()

    totalCols=3
    totalRows=math.ceil(len(dataset)/totalCols)
    widthForEachGraph=3
    fig = plt.figure(figsize=((totalCols+3)*3,(totalRows+1)*3))
    plt.subplots_adjust(wspace=0.2, hspace=0.5)

    for index, x in enumerate(keys):
        globals()[f"ax_count_plots_{index}"] = fig.add_subplot(totalRows,totalCols,(index+1))
        globals()[f"ax_count_plots_{index}"].set_title(x.upper())
        dataset[value_column].get_group(x).plot()
    plt.show()


In [None]:
linechart_of_categories(L1_Visual_year1,'STATION_NAME','DATE_FORMATTED','TEMP_MAX')

In [None]:
#Humidex

#L2_Visual_year1 = Temp_Humidex[["STATION_NAME","DATE_FORMATTED","YEAR","HUMIDEX_MAX"]].copy()
#L2_Visual_year1 = L2_Visual_year1[(L2_Visual_year1['YEAR'] == 2017)]


In [None]:
#linechart_of_categories(L2_Visual_year1,'STATION_NAME','DATE_FORMATTED','HUMIDEX_MAX')

Add Station ID

In [None]:
station_ids = dataset_nb_skim[['STATION_NAME','CLIMATE_IDENTIFIER','x','y']].copy()
station_ids.drop_duplicates(inplace=True)
station_ids

In [None]:
Temp_Humidex = pd.merge(Temp_Humidex, station_ids[['STATION_NAME','CLIMATE_IDENTIFIER']],  how='left', left_on=['STATION_NAME'], right_on = ['STATION_NAME'])

In [None]:
#Adjust Column position
Temp_Humidex =Temp_Humidex[["STATION_NAME","CLIMATE_IDENTIFIER","DATE_FORMATTED","YEAR","TEMP_MAX","DAY_MAX_TEMP","NIGHT_MAX_TEMP","TEMP_MIN","DAY_MIN_TEMP","NIGHT_MIN_TEMP","TEMP_MEAN","DAY_MEAN_TEMP","NIGHT_MEAN_TEMP","HUMIDEX_MAX","DAY_MAX_HUMIDEX","NIGHT_MAX_HUMIDEX","HUMIDEX_MIN","DAY_MIN_HUMIDEX","NIGHT_MIN_HUMIDEX","HUMIDEX_MEAN","DAY_MEAN_HUMIDEX","NIGHT_MEAN_HUMIDEX","TEMP_30","TEMP_40","NIGHT_18","HUMIDEX_36","HUMIDEX_40","HUMIDEX_45","GROUP_ALERT","COUNT_SET","LEVEL_1_ALERT","LEVEL_2_ALERT","LEVEL_3_ALERT"]]
Temp_Humidex.tail(5)

## Extra

### Calculate Heat Index/Humidex

Cross validate humidex if required with the below code

* Use Temperature and Relative Humidity to calculate Dew Point and Humidex
* Source 1: https://gist.github.com/sourceperl/45587ea99ff123745428
* Source 2: https://gist.github.com/VincentLoy/c3a9b3bd97be049bb01f
* Testing Source 1: https://weather.gc.ca/windchill/wind_chill_e.html
* Testing Source 2: https://www.calculator.net/heat-index-calculator.html
* Testing Source 3: https://www.omnicalculator.com/physics/heat-index 

In [None]:
def cal_humidex(air_temperature,relative_humidity):

    # Calculate Dew Point
    A = 17.27
    B = 237.7
    alpha = ((A * air_temperature) / (B + air_temperature)) + math.log(relative_humidity/100.0)
    dewpoint = round((B * alpha) / (A - alpha))

    # Calculate humidex
    kelvin = 273.15
    temperatureKelvin = air_temperature + kelvin
    dewpointKelvin = dewpoint + kelvin

    # Calculate vapor pressure in mbar.
    e = 6.11 * math.exp(5417.7530 * ((1 / kelvin) - (1 / dewpointKelvin)))

    # Calculate saturation vapor pressure
    h = 0.5555 * (e - 10.0)

    humidex = temperatureKelvin + h - kelvin
    return round(humidex)

In [None]:
cal_humidex(16,97)

### Export an Excel

In [None]:
import xlsxwriter as xl

In [None]:
def generateExcel(filename,dataset):
    sheetname = 'Sheet'
    tablename = 'Table'

    (rows, cols) = dataset.shape
    data = dataset.to_dict('split')['data']
    headers = []
    for col in dataset.columns:
        headers.append({'header':col})

    wb = xl.Workbook(filename)
    ws = wb.add_worksheet()

    ws.add_table(0, 0, rows, cols-1,
        {'name': tablename
        ,'data': data
        ,'columns': headers})

    wb.close()


In [None]:
Temp_Humidex = Temp_Humidex.fillna("")

In [None]:
file_name = "heat_nb.xlsx"

generateExcel(file_name,Temp_Humidex)

### Export a CSV

In [None]:
Temp_Humidex.to_csv("heat_nb.csv", sep=',',index=False)

The 'dataset_nb' dataframe is generated using the 124 CSV files downloaded from 'https://climate-change.canada.ca/climate-data/#/hourly-climate-data' and combined. If you see any issues with the combined file, you can always look at the downloaded files and rework on them. I added block comments to stop the code from running. Please read the details of each block below.

In [None]:
"""

#Load all the CSV files in unique variables.

dataset_nb_raw_1 = pd.read_csv("data/climate-hourly-1-10000.csv", encoding='unicode_escape')
dataset_nb_raw_2 = pd.read_csv("data/climate-hourly-10001-20000.csv", encoding='unicode_escape')
dataset_nb_raw_3 = pd.read_csv("data/climate-hourly-20001-30000.csv", encoding='unicode_escape')
dataset_nb_raw_4 = pd.read_csv("data/climate-hourly-30001-40000.csv", encoding='unicode_escape')
dataset_nb_raw_5 = pd.read_csv("data/climate-hourly-40001-50000.csv", encoding='unicode_escape')
dataset_nb_raw_6 = pd.read_csv("data/climate-hourly-50001-60000.csv", encoding='unicode_escape')
dataset_nb_raw_7 = pd.read_csv("data/climate-hourly-60001-70000.csv", encoding='unicode_escape')
dataset_nb_raw_8 = pd.read_csv("data/climate-hourly-70001-80000.csv", encoding='unicode_escape')
dataset_nb_raw_9 = pd.read_csv("data/climate-hourly-80001-90000.csv", encoding='unicode_escape')
dataset_nb_raw_10 = pd.read_csv("data/climate-hourly-90001-100000.csv", encoding='unicode_escape')
dataset_nb_raw_11 = pd.read_csv("data/climate-hourly-100001-110000.csv", encoding='unicode_escape')
dataset_nb_raw_12 = pd.read_csv("data/climate-hourly-110001-120000.csv", encoding='unicode_escape')
dataset_nb_raw_13 = pd.read_csv("data/climate-hourly-120001-130000.csv", encoding='unicode_escape')
dataset_nb_raw_14 = pd.read_csv("data/climate-hourly-130001-140000.csv", encoding='unicode_escape')
dataset_nb_raw_15 = pd.read_csv("data/climate-hourly-140001-150000.csv", encoding='unicode_escape')
dataset_nb_raw_16 = pd.read_csv("data/climate-hourly-150001-160000.csv", encoding='unicode_escape')
dataset_nb_raw_17 = pd.read_csv("data/climate-hourly-160001-170000.csv", encoding='unicode_escape')
dataset_nb_raw_18 = pd.read_csv("data/climate-hourly-170001-180000.csv", encoding='unicode_escape')
dataset_nb_raw_19 = pd.read_csv("data/climate-hourly-180001-190000.csv", encoding='unicode_escape')
dataset_nb_raw_20 = pd.read_csv("data/climate-hourly-190001-200000.csv", encoding='unicode_escape')
dataset_nb_raw_21 = pd.read_csv("data/climate-hourly-200001-210000.csv", encoding='unicode_escape')
dataset_nb_raw_22 = pd.read_csv("data/climate-hourly-210001-220000.csv", encoding='unicode_escape')
dataset_nb_raw_23 = pd.read_csv("data/climate-hourly-220001-230000.csv", encoding='unicode_escape')
dataset_nb_raw_24 = pd.read_csv("data/climate-hourly-230001-240000.csv", encoding='unicode_escape')
dataset_nb_raw_25 = pd.read_csv("data/climate-hourly-240001-250000.csv", encoding='unicode_escape')
dataset_nb_raw_26 = pd.read_csv("data/climate-hourly-250001-260000.csv", encoding='unicode_escape')
dataset_nb_raw_27 = pd.read_csv("data/climate-hourly-260001-270000.csv", encoding='unicode_escape')
dataset_nb_raw_28 = pd.read_csv("data/climate-hourly-270001-280000.csv", encoding='unicode_escape')
dataset_nb_raw_29 = pd.read_csv("data/climate-hourly-280001-290000.csv", encoding='unicode_escape')
dataset_nb_raw_30 = pd.read_csv("data/climate-hourly-290001-300000.csv", encoding='unicode_escape')
dataset_nb_raw_31 = pd.read_csv("data/climate-hourly-300001-310000.csv", encoding='unicode_escape')
dataset_nb_raw_32 = pd.read_csv("data/climate-hourly-310001-320000.csv", encoding='unicode_escape')
dataset_nb_raw_33 = pd.read_csv("data/climate-hourly-320001-330000.csv", encoding='unicode_escape')
dataset_nb_raw_34 = pd.read_csv("data/climate-hourly-330001-340000.csv", encoding='unicode_escape')
dataset_nb_raw_35 = pd.read_csv("data/climate-hourly-340001-350000.csv", encoding='unicode_escape')
dataset_nb_raw_36 = pd.read_csv("data/climate-hourly-350001-360000.csv", encoding='unicode_escape')
dataset_nb_raw_37 = pd.read_csv("data/climate-hourly-360001-370000.csv", encoding='unicode_escape')
dataset_nb_raw_38 = pd.read_csv("data/climate-hourly-370001-380000.csv", encoding='unicode_escape')
dataset_nb_raw_39 = pd.read_csv("data/climate-hourly-380001-390000.csv", encoding='unicode_escape')
dataset_nb_raw_40 = pd.read_csv("data/climate-hourly-390001-400000.csv", encoding='unicode_escape')
dataset_nb_raw_41 = pd.read_csv("data/climate-hourly-400001-410000.csv", encoding='unicode_escape')
dataset_nb_raw_42 = pd.read_csv("data/climate-hourly-410001-420000.csv", encoding='unicode_escape')
dataset_nb_raw_43 = pd.read_csv("data/climate-hourly-420001-430000.csv", encoding='unicode_escape')
dataset_nb_raw_44 = pd.read_csv("data/climate-hourly-430001-440000.csv", encoding='unicode_escape')
dataset_nb_raw_45 = pd.read_csv("data/climate-hourly-440001-450000.csv", encoding='unicode_escape')
dataset_nb_raw_46 = pd.read_csv("data/climate-hourly-450001-460000.csv", encoding='unicode_escape')
dataset_nb_raw_47 = pd.read_csv("data/climate-hourly-460001-470000.csv", encoding='unicode_escape')
dataset_nb_raw_48 = pd.read_csv("data/climate-hourly-470001-480000.csv", encoding='unicode_escape')
dataset_nb_raw_49 = pd.read_csv("data/climate-hourly-480001-490000.csv", encoding='unicode_escape')
dataset_nb_raw_50 = pd.read_csv("data/climate-hourly-490001-500000.csv", encoding='unicode_escape')
dataset_nb_raw_51 = pd.read_csv("data/climate-hourly-500001-510000.csv", encoding='unicode_escape')
dataset_nb_raw_52 = pd.read_csv("data/climate-hourly-510001-520000.csv", encoding='unicode_escape')
dataset_nb_raw_53 = pd.read_csv("data/climate-hourly-520001-530000.csv", encoding='unicode_escape')
dataset_nb_raw_54 = pd.read_csv("data/climate-hourly-530001-540000.csv", encoding='unicode_escape')
dataset_nb_raw_55 = pd.read_csv("data/climate-hourly-540001-550000.csv", encoding='unicode_escape')
dataset_nb_raw_56 = pd.read_csv("data/climate-hourly-550001-560000.csv", encoding='unicode_escape')
dataset_nb_raw_57 = pd.read_csv("data/climate-hourly-560001-570000.csv", encoding='unicode_escape')
dataset_nb_raw_58 = pd.read_csv("data/climate-hourly-570001-580000.csv", encoding='unicode_escape')
dataset_nb_raw_59 = pd.read_csv("data/climate-hourly-580001-590000.csv", encoding='unicode_escape')
dataset_nb_raw_60 = pd.read_csv("data/climate-hourly-590001-600000.csv", encoding='unicode_escape')
dataset_nb_raw_61 = pd.read_csv("data/climate-hourly-600001-610000.csv", encoding='unicode_escape')
dataset_nb_raw_62 = pd.read_csv("data/climate-hourly-610001-620000.csv", encoding='unicode_escape')
dataset_nb_raw_63 = pd.read_csv("data/climate-hourly-620001-630000.csv", encoding='unicode_escape')
dataset_nb_raw_64 = pd.read_csv("data/climate-hourly-630001-640000.csv", encoding='unicode_escape')
dataset_nb_raw_65 = pd.read_csv("data/climate-hourly-640001-650000.csv", encoding='unicode_escape')
dataset_nb_raw_66 = pd.read_csv("data/climate-hourly-650001-660000.csv", encoding='unicode_escape')
dataset_nb_raw_67 = pd.read_csv("data/climate-hourly-660001-670000.csv", encoding='unicode_escape')
dataset_nb_raw_68 = pd.read_csv("data/climate-hourly-670001-680000.csv", encoding='unicode_escape')
dataset_nb_raw_69 = pd.read_csv("data/climate-hourly-680001-690000.csv", encoding='unicode_escape')
dataset_nb_raw_70 = pd.read_csv("data/climate-hourly-690001-700000.csv", encoding='unicode_escape')
dataset_nb_raw_71 = pd.read_csv("data/climate-hourly-700001-710000.csv", encoding='unicode_escape')
dataset_nb_raw_72 = pd.read_csv("data/climate-hourly-710001-720000.csv", encoding='unicode_escape')
dataset_nb_raw_73 = pd.read_csv("data/climate-hourly-720001-730000.csv", encoding='unicode_escape')
dataset_nb_raw_74 = pd.read_csv("data/climate-hourly-730001-740000.csv", encoding='unicode_escape')
dataset_nb_raw_75 = pd.read_csv("data/climate-hourly-740001-750000.csv", encoding='unicode_escape')
dataset_nb_raw_76 = pd.read_csv("data/climate-hourly-750001-760000.csv", encoding='unicode_escape')
dataset_nb_raw_77 = pd.read_csv("data/climate-hourly-760001-770000.csv", encoding='unicode_escape')
dataset_nb_raw_78 = pd.read_csv("data/climate-hourly-770001-780000.csv", encoding='unicode_escape')
dataset_nb_raw_79 = pd.read_csv("data/climate-hourly-780001-790000.csv", encoding='unicode_escape')
dataset_nb_raw_80 = pd.read_csv("data/climate-hourly-790001-800000.csv", encoding='unicode_escape')
dataset_nb_raw_81 = pd.read_csv("data/climate-hourly-800001-810000.csv", encoding='unicode_escape')
dataset_nb_raw_82 = pd.read_csv("data/climate-hourly-810001-820000.csv", encoding='unicode_escape')
dataset_nb_raw_83 = pd.read_csv("data/climate-hourly-820001-830000.csv", encoding='unicode_escape')
dataset_nb_raw_84 = pd.read_csv("data/climate-hourly-830001-840000.csv", encoding='unicode_escape')
dataset_nb_raw_85 = pd.read_csv("data/climate-hourly-840001-850000.csv", encoding='unicode_escape')
dataset_nb_raw_86 = pd.read_csv("data/climate-hourly-850001-860000.csv", encoding='unicode_escape')
dataset_nb_raw_87 = pd.read_csv("data/climate-hourly-860001-870000.csv", encoding='unicode_escape')
dataset_nb_raw_88 = pd.read_csv("data/climate-hourly-870001-880000.csv", encoding='unicode_escape')
dataset_nb_raw_89 = pd.read_csv("data/climate-hourly-880001-890000.csv", encoding='unicode_escape')
dataset_nb_raw_90 = pd.read_csv("data/climate-hourly-890001-900000.csv", encoding='unicode_escape')
dataset_nb_raw_91 = pd.read_csv("data/climate-hourly-900001-910000.csv", encoding='unicode_escape')
dataset_nb_raw_92 = pd.read_csv("data/climate-hourly-910001-920000.csv", encoding='unicode_escape')
dataset_nb_raw_93 = pd.read_csv("data/climate-hourly-920001-930000.csv", encoding='unicode_escape')
dataset_nb_raw_94 = pd.read_csv("data/climate-hourly-930001-940000.csv", encoding='unicode_escape')
dataset_nb_raw_95 = pd.read_csv("data/climate-hourly-940001-950000.csv", encoding='unicode_escape')
dataset_nb_raw_96 = pd.read_csv("data/climate-hourly-950001-960000.csv", encoding='unicode_escape')
dataset_nb_raw_97 = pd.read_csv("data/climate-hourly-960001-970000.csv", encoding='unicode_escape')
dataset_nb_raw_98 = pd.read_csv("data/climate-hourly-970001-980000.csv", encoding='unicode_escape')
dataset_nb_raw_99 = pd.read_csv("data/climate-hourly-980001-990000.csv", encoding='unicode_escape')
dataset_nb_raw_100 = pd.read_csv("data/climate-hourly-990001-1000000.csv", encoding='unicode_escape')
dataset_nb_raw_101 = pd.read_csv("data/climate-hourly-1000001-1010000.csv", encoding='unicode_escape')
dataset_nb_raw_102 = pd.read_csv("data/climate-hourly-1010001-1020000.csv", encoding='unicode_escape')
dataset_nb_raw_103 = pd.read_csv("data/climate-hourly-1020001-1030000.csv", encoding='unicode_escape')
dataset_nb_raw_104 = pd.read_csv("data/climate-hourly-1030001-1040000.csv", encoding='unicode_escape')
dataset_nb_raw_105 = pd.read_csv("data/climate-hourly-1040001-1050000.csv", encoding='unicode_escape')
dataset_nb_raw_106 = pd.read_csv("data/climate-hourly-1050001-1060000.csv", encoding='unicode_escape')
dataset_nb_raw_107 = pd.read_csv("data/climate-hourly-1060001-1070000.csv", encoding='unicode_escape')
dataset_nb_raw_108 = pd.read_csv("data/climate-hourly-1070001-1080000.csv", encoding='unicode_escape')
dataset_nb_raw_109 = pd.read_csv("data/climate-hourly-1080001-1090000.csv", encoding='unicode_escape')
dataset_nb_raw_110 = pd.read_csv("data/climate-hourly-1090001-1100000.csv", encoding='unicode_escape')
dataset_nb_raw_111 = pd.read_csv("data/climate-hourly-1100001-1110000.csv", encoding='unicode_escape')
dataset_nb_raw_112 = pd.read_csv("data/climate-hourly-1110001-1120000.csv", encoding='unicode_escape')
dataset_nb_raw_113 = pd.read_csv("data/climate-hourly-1120001-1130000.csv", encoding='unicode_escape')
dataset_nb_raw_114 = pd.read_csv("data/climate-hourly-1130001-1140000.csv", encoding='unicode_escape')
dataset_nb_raw_115 = pd.read_csv("data/climate-hourly-1140001-1150000.csv", encoding='unicode_escape')
dataset_nb_raw_116 = pd.read_csv("data/climate-hourly-1150001-1160000.csv", encoding='unicode_escape')
dataset_nb_raw_117 = pd.read_csv("data/climate-hourly-1160001-1170000.csv", encoding='unicode_escape')
dataset_nb_raw_118 = pd.read_csv("data/climate-hourly-1170001-1180000.csv", encoding='unicode_escape')
dataset_nb_raw_119 = pd.read_csv("data/climate-hourly-1180001-1190000.csv", encoding='unicode_escape')
dataset_nb_raw_120 = pd.read_csv("data/climate-hourly-1190001-1200000.csv", encoding='unicode_escape')
dataset_nb_raw_121 = pd.read_csv("data/climate-hourly-1200001-1210000.csv", encoding='unicode_escape')
dataset_nb_raw_122 = pd.read_csv("data/climate-hourly-1210001-1220000.csv", encoding='unicode_escape')
dataset_nb_raw_123 = pd.read_csv("data/climate-hourly-1220001-1230000.csv", encoding='unicode_escape')
dataset_nb_raw_124 = pd.read_csv("data/climate-hourly-1230001-1233614.csv", encoding='unicode_escape')

"""


In [None]:
"""

# Combine all the dataframes into one

dataframes = [dataset_nb_raw_1, dataset_nb_raw_2, dataset_nb_raw_3, dataset_nb_raw_4, dataset_nb_raw_5, dataset_nb_raw_6, dataset_nb_raw_7, dataset_nb_raw_8, dataset_nb_raw_9, dataset_nb_raw_10, dataset_nb_raw_11, dataset_nb_raw_12, dataset_nb_raw_13, dataset_nb_raw_14, dataset_nb_raw_15, dataset_nb_raw_16, dataset_nb_raw_17, dataset_nb_raw_18, dataset_nb_raw_19, dataset_nb_raw_20, dataset_nb_raw_21, dataset_nb_raw_22, dataset_nb_raw_23, dataset_nb_raw_24, dataset_nb_raw_25, dataset_nb_raw_26, dataset_nb_raw_27, dataset_nb_raw_28, dataset_nb_raw_29, dataset_nb_raw_30, dataset_nb_raw_31, dataset_nb_raw_32, dataset_nb_raw_33, dataset_nb_raw_34, dataset_nb_raw_35, dataset_nb_raw_36, dataset_nb_raw_37, dataset_nb_raw_38, dataset_nb_raw_39, dataset_nb_raw_40, dataset_nb_raw_41, dataset_nb_raw_42, dataset_nb_raw_43, dataset_nb_raw_44, dataset_nb_raw_45, dataset_nb_raw_46, dataset_nb_raw_47, dataset_nb_raw_48, dataset_nb_raw_49, dataset_nb_raw_50, dataset_nb_raw_51, dataset_nb_raw_52, dataset_nb_raw_53, dataset_nb_raw_54, dataset_nb_raw_55, dataset_nb_raw_56, dataset_nb_raw_57, dataset_nb_raw_58, dataset_nb_raw_59, dataset_nb_raw_60, dataset_nb_raw_61, dataset_nb_raw_62, dataset_nb_raw_63, dataset_nb_raw_64, dataset_nb_raw_65, dataset_nb_raw_66, dataset_nb_raw_67, dataset_nb_raw_68, dataset_nb_raw_69, dataset_nb_raw_70, dataset_nb_raw_71, dataset_nb_raw_72, dataset_nb_raw_73, dataset_nb_raw_74, dataset_nb_raw_75, dataset_nb_raw_76, dataset_nb_raw_77, dataset_nb_raw_78, dataset_nb_raw_79, dataset_nb_raw_80, dataset_nb_raw_81, dataset_nb_raw_82, dataset_nb_raw_83, dataset_nb_raw_84, dataset_nb_raw_85, dataset_nb_raw_86, dataset_nb_raw_87, dataset_nb_raw_88, dataset_nb_raw_89, dataset_nb_raw_90, dataset_nb_raw_91, dataset_nb_raw_92, dataset_nb_raw_93, dataset_nb_raw_94, dataset_nb_raw_95, dataset_nb_raw_96, dataset_nb_raw_97, dataset_nb_raw_98, dataset_nb_raw_99, dataset_nb_raw_100, dataset_nb_raw_101, dataset_nb_raw_102, dataset_nb_raw_103, dataset_nb_raw_104, dataset_nb_raw_105, dataset_nb_raw_106, dataset_nb_raw_107, dataset_nb_raw_108, dataset_nb_raw_109, dataset_nb_raw_110, dataset_nb_raw_111, dataset_nb_raw_112, dataset_nb_raw_113, dataset_nb_raw_114, dataset_nb_raw_115, dataset_nb_raw_116, dataset_nb_raw_117, dataset_nb_raw_118, dataset_nb_raw_119, dataset_nb_raw_120, dataset_nb_raw_121, dataset_nb_raw_122, dataset_nb_raw_123, dataset_nb_raw_124]

#Check if dataframes have the same columns

if all([set(dataframes[0].columns) == set(df.columns) for df in dataframes]):
    print('Datasets have the same columns')
    dataset_nb_raw = pd.concat(dataframes)
else:
    print('Datasets do not have the same columns')

dataset_nb = dataset_nb_raw.copy()
dataset_nb.head(5)

"""

In [None]:
"""

#Export Combined Dataset to a CSV

dataset_nb.to_csv("data/climate-hourly-nb.csv", sep=',',index=False)

"""