Traitements de données

In [653]:
import pandas as pd
import dateparser
import numpy as np
from datetime import timedelta

**Problem Statement :**
My project involves predicting the likelihood of fire department intervention based on meteorological data, from SDIS Chamberonne. This approach could be scaled to benefit the entire canton of Vaud, improving resource management and preventive measures.
RQ : Is it possible to accurately predict whether a fire department intervention is likely to occur depending on weather conditions in a given region?

**Step 1 : Dataset Selection and Description**

All my data is open source. Fire department intervention data was collected from the official SDIS Chamberonne website (https://www.sdis-chamberonne.ch/alarmes/), weather data came from an online service (https://www.visualcrossing.com/weather-history/Chavannes-pr%C3%A8s-renens/us/last15days#), and school vacation data was obtained from an online site (https://www.feiertagskalender.ch/ferien.php?geo=2451&hl=fr).

I didn't need to anonymize the fire department data, as it was already anonymous.

In [654]:
#Upload the data of meteorological data
bdd2020 = pd.read_csv('chavannes-près-renens 2020-01-01 to 2020-12-31.csv')
bdd2021 = pd.read_csv('chavannes-près-renens 2021-01-01 to 2021-12-31.csv')
bdd2022 = pd.read_csv('chavannes-près-renens 2022-01-01 to 2022-12-31.csv')
bdd2023 = pd.read_csv('chavannes-près-renens 2023-01-01 to 2023-12-31.csv')

df_2020 = bdd2020.copy()
df_2021 = bdd2021.copy()
df_2022 = bdd2022.copy()
df_2023 = bdd2023.copy()

df_2020=pd.DataFrame(df_2020)
df_2021=pd.DataFrame(df_2021)
df_2022=pd.DataFrame(df_2022)
df_2023=pd.DataFrame(df_2023)

df_2023.head()



Unnamed: 0,name,datetime,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,...,solarenergy,uvindex,severerisk,sunrise,sunset,moonphase,conditions,description,icon,stations
0,chavannes-près-renens,2023-01-01,57.9,49.6,52.8,57.9,46.6,52.4,44.2,73.1,...,6.0,3,10,2023-01-01T08:17:49,2023-01-01T16:56:41,0.31,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"06610099999,E5414,06700099999,06720099999,0670..."
1,chavannes-près-renens,2023-01-02,55.8,46.8,52.0,55.8,46.3,51.7,39.4,63.2,...,2.4,2,10,2023-01-02T08:17:49,2023-01-02T16:57:38,0.35,"Rain, Partially cloudy",Partly cloudy throughout the day with rain.,rain,"E5414,06704099999,06707099999,06618099999,0670..."
2,chavannes-près-renens,2023-01-03,46.1,42.5,44.5,46.1,42.5,44.5,43.4,96.2,...,1.1,1,10,2023-01-03T08:17:47,2023-01-03T16:58:36,0.38,"Rain, Overcast",Cloudy skies throughout the day with a chance ...,rain,"E5414,06704099999,06707099999,06618099999,0670..."
3,chavannes-près-renens,2023-01-04,46.7,40.6,44.0,44.5,38.4,42.2,41.6,91.5,...,3.5,2,10,2023-01-04T08:17:41,2023-01-04T16:59:37,0.41,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"E5414,06704099999,06707099999,06618099999,0670..."
4,chavannes-près-renens,2023-01-05,49.9,43.4,45.9,49.9,40.7,44.3,44.3,93.9,...,5.0,3,10,2023-01-05T08:17:34,2023-01-05T17:00:40,0.45,"Rain, Partially cloudy",Partly cloudy throughout the day with a chance...,rain,"E5414,06704099999,06707099999,06618099999,0670..."


*Explain features :
tempmax	Maximum Temperature  
tempmin	Minimum Temperature  
temp	Temperature (or mean temperature)  
dew	Dew Point  	
feelslike	Feels like  
precip	Precipitation  	
precipprob	Precipitation chance  
precipcover	Precipitation cover (The proportion of time for which measurable precipitation was recorded during the time period)  
preciptype	Precipitation type  
snow	Snow  
snowdepth	Snow depth  	
windspeed	Wind speed  	
windgust	Maximum wind speed measures over a short amount of time  	
winddir	Wind direction  
visibility	Visibility distance	that can be seen in daylight  
cloudcover	Cloud cover  
humidity	Relative humidity  
pressure	Sea level pressure (The atmospheric pressure at a location that removes reduction in pressure due to the altitude of the location)  	
solarradiation	Solar radiation	(the power (in W/m2) at the instantaneous moment of the observation)  
solarenergy	Solar energy (otal energy from the sun that builds up over an hour or day)  
uvindex	UV index  	
severerisk	Severe Risk  
sunrise	Sunrise time  
sunset	Sunset time  	
moonphase	Moonphase  
icon	A weather icon  
conditions	Short text about the weather  
description	Description of the weather for the day  	
stations	List of weather stations sources  

In [656]:
#Conversion des colonnes datetime en datetime
df_2020['datetime']=pd.to_datetime(df_2020['datetime'])
df_2021['datetime']=pd.to_datetime(df_2021['datetime'])
df_2022['datetime']=pd.to_datetime(df_2022['datetime'])
df_2023['datetime']=pd.to_datetime(df_2023['datetime'])
df_2020.dtypes

name                        object
datetime            datetime64[ns]
tempmax                    float64
tempmin                    float64
temp                       float64
feelslikemax               float64
feelslikemin               float64
feelslike                  float64
dew                        float64
humidity                   float64
precip                     float64
precipprob                   int64
precipcover                float64
preciptype                  object
snow                       float64
snowdepth                  float64
windgust                   float64
windspeed                  float64
winddir                    float64
sealevelpressure           float64
cloudcover                 float64
visibility                 float64
solarradiation             float64
solarenergy                float64
uvindex                      int64
severerisk                 float64
sunrise                     object
sunset                      object
moonphase           

**Step 2 : Data Preprocessing and Feature Extraction**

In [658]:
#Concatenation of the 4 dataframes
df_meteo=pd.concat([df_2020,df_2021,df_2022,df_2023],ignore_index=True)
df_meteo=df_meteo.drop(columns=['name','feelslikemax','feelslikemin','windgust','uvindex','icon','sunrise','sunset','moonphase','stations'])
df_meteo.head()

Unnamed: 0,datetime,tempmax,tempmin,temp,feelslike,dew,humidity,precip,precipprob,precipcover,...,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,severerisk,conditions,description
0,2020-01-01,34.7,32.2,33.2,33.1,31.5,93.5,0.0,0,0.0,...,3.4,332.1,1034.1,33.9,,47.4,4.1,,Partially cloudy,Clearing in the afternoon.
1,2020-01-02,36.0,30.7,33.8,33.2,32.3,94.2,0.0,0,0.0,...,4.1,326.8,1032.4,24.0,,69.2,6.0,,Partially cloudy,Partly cloudy throughout the day.
2,2020-01-03,45.1,35.6,40.3,37.6,36.1,85.2,0.007,100,8.33,...,6.3,321.1,1030.2,78.4,,42.9,3.8,,"Rain, Partially cloudy",Partly cloudy throughout the day with late aft...
3,2020-01-04,47.4,36.2,42.2,40.2,37.0,82.1,0.003,100,12.5,...,8.0,346.0,1033.1,65.3,,67.9,5.9,,"Rain, Partially cloudy",Partly cloudy throughout the day with morning ...
4,2020-01-05,40.6,33.4,36.4,32.2,30.0,77.3,0.0,0,0.0,...,7.5,39.9,1034.9,14.5,,74.0,6.4,,Clear,Clear conditions throughout the day.


In [659]:
#Upload the dataset of interventions
bdd_intervention = pd.read_excel('BDD_interventions.xlsx')
interventions_copy=bdd_intervention.copy()
interventions_type=pd.get_dummies(interventions_copy['Type'], dtype='int')
interventions_type.head()
interventions=pd.concat([interventions_copy,interventions_type], axis =1)
#delete datas from the year 2024
interventions=interventions[interventions['Year']!= 2024]

In [660]:
#Transformation of the columns 'Date début alarmes' and 'Date fin alarme' in datetime
interventions['Date début alarmes'] = interventions['Date début alarmes'].apply(lambda x :dateparser.parse(x))
interventions['Date fin alarme'] = interventions['Date fin alarme'].apply(lambda x :dateparser.parse(x))
interventions['Durée']=(interventions['Date fin alarme'] - interventions['Date début alarmes'])
duree_column = interventions.pop('Durée') 
interventions.insert(3, 'Durée', duree_column)
interventions = interventions.drop(['Numéro','Description'], axis=1)
interventions.tail()


Unnamed: 0,Date début alarmes,Date fin alarme,Durée,Type,Commune,Year,Alarme automatique,Alarme automatique réelle,Assistance à personne,DCH,Divers,Feu,Inondation,Officier de service,Sauvetage animaux/NAC,Sauvetage personne
562,2023-01-09 01:10:00,2023-01-09 02:26:00,0 days 01:16:00,Divers,Chavannes,2023,0,0,0,0,1,0,0,0,0,0
563,2023-01-08 13:52:00,2023-01-08 17:10:00,0 days 03:18:00,Alarme automatique réelle,Ecublens,2023,0,1,0,0,0,0,0,0,0,0
564,2023-01-07 09:36:00,2023-01-09 11:10:00,2 days 01:34:00,Divers,Chavannes,2023,0,0,0,0,1,0,0,0,0,0
565,2023-01-03 13:04:00,2023-01-03 14:24:00,0 days 01:20:00,Divers,Ecublens,2023,0,0,0,0,1,0,0,0,0,0
566,2023-01-03 12:48:00,2023-01-03 13:02:00,0 days 00:14:00,Alarme automatique,Ecublens,2023,1,0,0,0,0,0,0,0,0,0


In [661]:
#sepate the date and the time in two columns
interventions['Heure début alarmes'] =interventions['Date début alarmes'].dt.time
interventions['Heure fin alarmes'] =interventions['Date fin alarme'].dt.time
interventions = interventions.rename(columns = {'Date fin alarme':'Date fin alarmes'})
interventions['Date début alarmes'] = interventions['Date début alarmes'].dt.date
interventions['Date fin alarmes'] = interventions['Date fin alarmes'].dt.date

heure_debut_column = interventions.pop('Heure début alarmes') 
heure_fin_column = interventions.pop('Heure fin alarmes')

interventions.insert(1, 'Heure début alarmes', heure_debut_column) 
interventions.insert(3, 'Heure fin alarmes', heure_fin_column)

interventions.head(10)

Unnamed: 0,Date début alarmes,Heure début alarmes,Date fin alarmes,Heure fin alarmes,Durée,Type,Commune,Year,Alarme automatique,Alarme automatique réelle,Assistance à personne,DCH,Divers,Feu,Inondation,Officier de service,Sauvetage animaux/NAC,Sauvetage personne
0,2020-12-25,00:01:00,2020-12-25,00:44:00,0 days 00:43:00,Inondation,Ecublens,2020,0,0,0,0,0,0,1,0,0,0
1,2020-12-17,08:59:00,2020-12-17,09:40:00,0 days 00:41:00,Alarme automatique,Ecublens,2020,1,0,0,0,0,0,0,0,0,0
2,2020-12-12,21:40:00,2020-12-13,22:45:00,1 days 01:05:00,Divers,Ecublens,2020,0,0,0,0,1,0,0,0,0,0
3,2020-12-12,18:49:00,2020-12-12,20:00:00,0 days 01:11:00,Inondation,Chavannes,2020,0,0,0,0,0,0,1,0,0,0
4,2020-12-07,05:16:00,2020-12-07,05:40:00,0 days 00:24:00,Alarme automatique,Chavannes,2020,1,0,0,0,0,0,0,0,0,0
5,2020-12-05,19:47:00,2020-12-05,21:00:00,0 days 01:13:00,Feu,Chavannes,2020,0,0,0,0,0,1,0,0,0,0
6,2020-12-04,15:20:00,2020-12-04,16:20:00,0 days 01:00:00,Alarme automatique,Chavannes,2020,1,0,0,0,0,0,0,0,0,0
7,2020-12-02,18:48:00,2020-12-02,19:25:00,0 days 00:37:00,Alarme automatique réelle,Saint-Sulpice,2020,0,1,0,0,0,0,0,0,0,0
8,2020-11-29,22:20:00,2020-11-29,22:45:00,0 days 00:25:00,Divers,Chavannes,2020,0,0,0,0,1,0,0,0,0,0
9,2020-11-29,10:28:00,2020-11-29,11:45:00,0 days 01:17:00,Officier de service,Chavannes,2020,0,0,0,0,0,0,0,1,0,0


In [662]:
#conversion of the columns 'Date début alarmes' and 'Date fin alarmes' in datetime
interventions['Date début alarmes'] = pd.to_datetime(interventions['Date début alarmes'])
interventions['Date fin alarmes'] = pd.to_datetime(interventions['Date fin alarmes'])
print(interventions.dtypes)

Date début alarmes            datetime64[ns]
Heure début alarmes                   object
Date fin alarmes              datetime64[ns]
Heure fin alarmes                     object
Durée                        timedelta64[ns]
Type                                  object
Commune                               object
Year                                   int64
Alarme automatique                     int32
Alarme automatique réelle              int32
Assistance à personne                  int32
DCH                                    int32
Divers                                 int32
Feu                                    int32
Inondation                             int32
Officier de service                    int32
Sauvetage animaux/NAC                  int32
Sauvetage personne                     int32
dtype: object


In [663]:
#Merge the two datasets
interventions = interventions.rename(columns = {'Date début alarmes':'datetime'})
bdd_original=pd.merge(interventions,df_meteo, how ='outer', on ='datetime')
bdd_original[['Alarme automatique','Alarme automatique réelle','Assistance à personne','DCH','Divers','Feu',
              'Inondation','Officier de service','Sauvetage animaux/NAC','Sauvetage personne']]=bdd_original[['Alarme automatique','Alarme automatique réelle',
              'Assistance à personne','DCH','Divers','Feu','Inondation','Officier de service','Sauvetage animaux/NAC','Sauvetage personne']].fillna(0).astype('int')
bdd_original.dtypes


datetime                      datetime64[ns]
Heure début alarmes                   object
Date fin alarmes              datetime64[ns]
Heure fin alarmes                     object
Durée                        timedelta64[ns]
Type                                  object
Commune                               object
Year                                 float64
Alarme automatique                     int32
Alarme automatique réelle              int32
Assistance à personne                  int32
DCH                                    int32
Divers                                 int32
Feu                                    int32
Inondation                             int32
Officier de service                    int32
Sauvetage animaux/NAC                  int32
Sauvetage personne                     int32
tempmax                              float64
tempmin                              float64
temp                                 float64
feelslike                            float64
dew       

In [664]:
#Create new columns/features
bdd_original['Intervention'] = bdd_original['Type'].apply(lambda x: 1 if pd.notna(x) else 0)
bdd_original['Weekend']=bdd_original['datetime'].dt.dayofweek // 5
bdd_original['Weekend']=bdd_original['Weekend'].astype(str)

bdd_original['Type'].fillna('Aucun', inplace=True)
bdd_original['preciptype'].fillna('Rien', inplace=True)
bdd_original['visibility'].fillna(0, inplace=True)
bdd_original['severerisk'].fillna(0, inplace=True)

bdd_original['Precip_last_7D'] = bdd_original.rolling(window="7D", on='datetime')['precip'].sum()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  bdd_original['Type'].fillna('Aucun', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  bdd_original['preciptype'].fillna('Rien', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are s

In [665]:
#Upload the dataset of school holidays
hol = pd.read_excel('Vacances_scolaires.xlsx')
hol.dtypes

Type                  object
start_date    datetime64[ns]
end_date      datetime64[ns]
dtype: object

In [666]:
#Transformation of the columns 'start_date' and 'end_date' in datetime
def check_holidays(datetime):
    return any(start_date <= datetime <= end_date for start_date, end_date in zip(hol['start_date'], hol['end_date']))

# Apply the function to each datetime in 'bdd_original['datetime']'
bdd_original['holidays'] = bdd_original['datetime'].apply(check_holidays).astype(int)
