In [3]:
import numpy as np
import pandas as pd

# Introduction

**We know that that heatwave is the extented period of time when extremely high temperatures are observed relative to the climate of the location. In many literatures the severity of the heatwave is represented by the Excess Heat Factor (EHF) [Nairn et al., 2015](https://pubmed.ncbi.nlm.nih.gov/25546282/) and mathematically it is defined to be:**

$$EHF=EHI_{sig}*max(1, EHI_{accl})$$ 



where

$$EHI_{sig}=(T_i+T_{i+1}+T_{i+2}/3-T_{95})$$ 
$$EHI_{accl}=(T_i+T_{i+1}+T_{i+2}/3-T_i+T_{i-1}+...+T_{i-30}/30)$$ 




**$T_{95}$ is calculated by taking the 95th percetile of the DMTs. DMT of the day if found as the mean of the highest and the lowest temperatures of the given day** 

# Data collection

**Using [Opendata meteo.be](https://opendata.meteo.be) the following temperature data was collected in csv format:**


*Cities: Brussels, Antwerp, Liege*

*Period: 1952-2021* 

*Frequency: Hourly*

# Data cleaning and transformation

In [4]:
Bru=pd.read_csv("./data/Brussels.csv")
Ant=pd.read_csv("./data/Antwerp.csv")
Lie=pd.read_csv("./data/Liege.csv")

In [7]:
Bru.head()

Unnamed: 0,FID,the_geom,code,timestamp,precip_quantity,precip_range,temp,temp_min,temp_max,temp_grass_min,...,wind_speed_unit,wind_direction,wind_peak_speed,humidity_relative,weather_current,pressure,pressure_station_level,sun_duration_24hours,short_wave_from_sky_24hours,cloudiness
0,synop_data.6451.1952-01-01 00:00:00+00,POINT (50.896391 4.526765),6451,1952-01-01T00:00:00,,,3.0,,,,...,,90.0,,,61.0,1005.5,,,,8.0
1,synop_data.6451.1952-01-01 03:00:00+00,POINT (50.896391 4.526765),6451,1952-01-01T03:00:00,,,3.0,,,,...,,,,,50.0,1003.1,,,,8.0
2,synop_data.6451.1952-01-01 06:00:00+00,POINT (50.896391 4.526765),6451,1952-01-01T06:00:00,2.0,2.0,3.0,3.0,,,...,,250.0,,,51.0,1004.0,,,,8.0
3,synop_data.6451.1952-01-01 09:00:00+00,POINT (50.896391 4.526765),6451,1952-01-01T09:00:00,,,3.0,,,,...,,270.0,,,21.0,1006.9,,,,5.0
4,synop_data.6451.1952-01-01 12:00:00+00,POINT (50.896391 4.526765),6451,1952-01-01T12:00:00,,,4.0,,,,...,,260.0,,,25.0,1009.2,,,,6.0


**As we are using only the information on temperature we need to keep only the timestamp and the temperature**

In [17]:
Bru=Bru[["timestamp", "temp"]]
Ant=Ant[["timestamp", "temp"]]
Lie=Lie[["timestamp", "temp"]]

In [18]:
Bru.head()

Unnamed: 0,timestamp,temp
0,1952-01-01T00:00:00,3.0
1,1952-01-01T03:00:00,3.0
2,1952-01-01T06:00:00,3.0
3,1952-01-01T09:00:00,3.0
4,1952-01-01T12:00:00,4.0


In [41]:
Bru['timestamp_formatted']=pd.to_datetime(Bru['timestamp'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Bru['timestamp_formatted']=pd.to_datetime(Bru['timestamp'])


In [23]:
Bru.isna().sum()/Bru.shape[0]

timestamp    0.000000
temp         0.000794
dtype: float64

In [24]:
Ant.isna().sum()/Ant.shape[0]

timestamp    0.000000
temp         0.034075
dtype: float64

In [25]:
Lie.isna().sum()/Lie.shape[0]

timestamp    0.000000
temp         0.009356
dtype: float64