# PREDICTION ET CLASSIFICATION DES ACCIDENTS DE LA ROUTE


## Introduction
Dans le cadre du challenge SALTIS, nous proposons un ce notebook pour l'étude et la réalisation de notre modéle de prédiction.

Ce processus va se faire en plusieurs étapes:
* Setup
* Analyse et Préparation Des Données
* Visualisation des Données
* Machine Learning

## Setup
Importation des modules de bases et la configuration de l'environnement

In [15]:
# Python ≥3.5 est nécessaire
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 est nécessaire
import sklearn
assert sklearn.__version__ >= "0.20"

# Importation de packages basiques
import numpy as np
import os

# Pour avoir de jolies figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Variable(s) Utile(s)
ACCIDENT_PATH = "datasets"
FILES = ['Accidents.csv', 'Casualties.csv', 'Vehicles.csv']

# Fonction pour sauvegarder les figures:
def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join("static", fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)


## Analyste et Préparation des données 
Dans cette partie nous allons explorer les donner, développer une bonne intuition de la distribution des ces derniers et puis finir par le nettoyage.

### Analyse
Nous allons importer 3 fichiers en csv qui sont les données sur les accidents du Royaume Uni suivant les 5 dernieres années (2017-2022):
* Accidents.csv : il comporte les données essesntielles sur les accidents
* Casualties.csv : il s'agit des données sur les dégats, pertes
* Vehicles: il renseigne des informations supplémentaires sur les véhicules

In [10]:
import pandas as pd

#Fonction pour charger un fichier csv
def load_data(index):
  csv_path = os.path.join(ACCIDENT_PATH, FILES[index])
  return pd.read_csv(csv_path, index_col='accident_index')

### Importation et Vue rapide des données

#### Accidents.csv

In [18]:
accidents = load_data(0)
accidents.head()

Unnamed: 0_level_0,accident_year,accident_reference,location_easting_osgr,location_northing_osgr,longitude,latitude,police_force,accident_severity,number_of_vehicles,number_of_casualties,...,pedestrian_crossing_physical_facilities,light_conditions,weather_conditions,road_surface_conditions,special_conditions_at_site,carriageway_hazards,urban_or_rural_area,did_police_officer_attend_scene_of_accident,trunk_road_flag,lsoa_of_accident_location
accident_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017010001708,2017,10001708,532920.0,196330.0,-0.080107,51.650061,1,1,2,3,...,0,4,1,1,0,0,1,1,2,E01001450
2017010009342,2017,10009342,526790.0,181970.0,-0.173845,51.522425,1,3,2,1,...,0,4,1,2,0,0,1,1,2,E01004702
2017010009344,2017,10009344,535200.0,181260.0,-0.052969,51.514096,1,3,3,1,...,0,4,1,1,0,0,1,1,2,E01004298
2017010009348,2017,10009348,534340.0,193560.0,-0.060658,51.624832,1,3,2,1,...,4,4,2,2,0,0,1,1,2,E01001429
2017010009350,2017,10009350,533680.0,187820.0,-0.072372,51.573408,1,2,1,1,...,5,4,1,2,0,0,1,1,2,E01001808


In [21]:
accidents.info()

<class 'pandas.core.frame.DataFrame'>
Index: 562439 entries, 2017010001708 to 2021991201030
Data columns (total 35 columns):
 #   Column                                       Non-Null Count   Dtype  
---  ------                                       --------------   -----  
 0   accident_year                                562439 non-null  int64  
 1   accident_reference                           562439 non-null  object 
 2   location_easting_osgr                        562306 non-null  float64
 3   location_northing_osgr                       562306 non-null  float64
 4   longitude                                    562296 non-null  float64
 5   latitude                                     562296 non-null  float64
 6   police_force                                 562439 non-null  int64  
 7   accident_severity                            562439 non-null  int64  
 8   number_of_vehicles                           562439 non-null  int64  
 9   number_of_casualties                         

#### Casulaties.csv

In [19]:
casualties = load_data(1)
casualties.head()

Unnamed: 0_level_0,accident_year,accident_reference,vehicle_reference,casualty_reference,casualty_class,sex_of_casualty,age_of_casualty,age_band_of_casualty,casualty_severity,pedestrian_location,pedestrian_movement,car_passenger,bus_or_coach_passenger,pedestrian_road_maintenance_worker,casualty_type,casualty_home_area_type,casualty_imd_decile,lsoa_of_casualty
accident_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2017010001708,2017,10001708,1,1,2,2,18,4,3,0,0,1,0,0,9,1,2,E01001414
2017010001708,2017,10001708,2,2,1,1,19,4,2,0,0,0,0,0,2,-1,-1,-1
2017010001708,2017,10001708,2,3,2,1,18,4,1,0,0,0,0,0,2,-1,-1,-1
2017010009342,2017,10009342,1,1,2,2,33,6,3,0,0,1,0,0,9,1,5,E01000589
2017010009344,2017,10009344,3,1,1,2,31,6,3,0,0,0,0,0,9,1,5,E01003756


In [20]:
casualties.info()

<class 'pandas.core.frame.DataFrame'>
Index: 728541 entries, 2017010001708 to 2021991201030
Data columns (total 18 columns):
 #   Column                              Non-Null Count   Dtype 
---  ------                              --------------   ----- 
 0   accident_year                       728541 non-null  int64 
 1   accident_reference                  728541 non-null  object
 2   vehicle_reference                   728541 non-null  int64 
 3   casualty_reference                  728541 non-null  int64 
 4   casualty_class                      728541 non-null  int64 
 5   sex_of_casualty                     728541 non-null  int64 
 6   age_of_casualty                     728541 non-null  int64 
 7   age_band_of_casualty                728541 non-null  int64 
 8   casualty_severity                   728541 non-null  int64 
 9   pedestrian_location                 728541 non-null  int64 
 10  pedestrian_movement                 728541 non-null  int64 
 11  car_passenger            

####  Vehicles.csv

In [17]:
vehicles = load_data(2)
vehicles.head()

Unnamed: 0_level_0,accident_year,accident_reference,vehicle_reference,vehicle_type,towing_and_articulation,vehicle_manoeuvre,vehicle_direction_from,vehicle_direction_to,vehicle_location_restricted_lane,junction_location,...,sex_of_driver,age_of_driver,age_band_of_driver,engine_capacity_cc,propulsion_code,age_of_vehicle,generic_make_model,driver_imd_decile,driver_home_area_type,lsoa_of_driver
accident_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017010001708,2017,10001708,1,9,0,18,1,5,0,0,...,1,24,5,1997,2,1,-1,-1,-1,-1
2017010001708,2017,10001708,2,2,0,18,1,5,0,0,...,1,19,4,-1,-1,-1,-1,-1,-1,-1
2017010009342,2017,10009342,1,9,0,18,5,1,0,1,...,1,33,6,1797,8,8,-1,9,1,E01023674
2017010009342,2017,10009342,2,9,0,18,5,1,0,1,...,1,40,7,2204,2,12,-1,2,1,E01004755
2017010009344,2017,10009344,1,9,0,18,3,7,0,1,...,3,-1,-1,-1,-1,-1,-1,-1,-1,-1


In [22]:
vehicles.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1035534 entries, 2017010001708 to 2021991201030
Data columns (total 27 columns):
 #   Column                            Non-Null Count    Dtype 
---  ------                            --------------    ----- 
 0   accident_year                     1035534 non-null  int64 
 1   accident_reference                1035534 non-null  object
 2   vehicle_reference                 1035534 non-null  int64 
 3   vehicle_type                      1035534 non-null  int64 
 4   towing_and_articulation           1035534 non-null  int64 
 5   vehicle_manoeuvre                 1035534 non-null  int64 
 6   vehicle_direction_from            1035534 non-null  int64 
 7   vehicle_direction_to              1035534 non-null  int64 
 8   vehicle_location_restricted_lane  1035534 non-null  int64 
 9   junction_location                 1035534 non-null  int64 
 10  skidding_and_overturning          1035534 non-null  int64 
 11  hit_object_in_carriageway         103