# UK Traffic Accidents Predictions (Datos de seguridad vial)

## Problem Definition

Partimos de 3 .csv que guardan los datos de los accidentes, victimas y vehiculos de UK del año 2020. Fueron descargados desde la pagina oficial del gobierno de Gran Bretaña https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data

De esos 3 .csv iniciales, los unificamos en uno solo mediante el identificador del accidente y después (ya solo con un .csv) filtramos y obtenemos los datos que nos interesan para luego implementar los modelos. 

<img src='problemOverview.jpg' >

# Datasets Analysis

Como pudimos ver en el esquema anterior cada dataset tiene diferentes columnas (auqnue veremos que varias de ellas se repiten para todos) por lo que filtraremos y nos quedaremos con las siguientes columnas:

### Import python libraries

In [4]:
import os
import pandas as pd

### Import datasets

In [5]:
df_vehicles = pd.read_csv('dft-road-casualty-statistics-vehicle-2020.csv')
df_accidents = pd.read_csv('dft-road-casualty-statistics-accident-2020.csv')
df_casualties = pd.read_csv('dft-road-casualty-statistics-casualty-2020.csv')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unificamos los 3 csv anteriores usando merge function con how='inner'. Dado el propio funcionamiento de la funcion lo deberemos hacer en dos partes (df_vehicles+df_accidents)+df_casualties

In [7]:
df_traffic_accidents_v1 = pd.merge(df_accidents, df_vehicles, how='inner')
df_traffic_accidents = pd.merge(df_traffic_accidents_v1, df_casualties, how='inner')

In [8]:
df_traffic_accidents.head()

Unnamed: 0,accident_index,accident_year,accident_reference,location_easting_osgr,location_northing_osgr,longitude,latitude,police_force,accident_severity,number_of_vehicles,...,age_band_of_casualty,casualty_severity,pedestrian_location,pedestrian_movement,car_passenger,bus_or_coach_passenger,pedestrian_road_maintenance_worker,casualty_type,casualty_home_area_type,casualty_imd_decile
0,2020010278554,2020,10278554,531639.0,168889.0,-0.108858,51.403761,1,2,2,...,5,2,0,0,0,0,0,3,-1,-1
1,2020010278556,2020,10278556,528687.0,184702.0,-0.145519,51.546549,1,2,2,...,4,2,0,0,0,0,0,3,1,1
2,2020010278558,2020,10278558,534296.0,179438.0,-0.066682,51.497938,1,3,2,...,7,3,0,0,0,0,0,3,1,3
3,2020010278559,2020,10278559,530354.0,172580.0,-0.125965,51.437228,1,3,2,...,7,3,0,0,0,0,0,3,1,6
4,2020010278561,2020,10278561,534946.0,184874.0,-0.055243,51.546633,1,2,1,...,7,2,1,1,0,0,0,0,1,2


In [9]:
df_traffic_accidents.dtypes

accident_index                         object
accident_year                           int64
accident_reference                     object
location_easting_osgr                 float64
location_northing_osgr                float64
                                       ...   
bus_or_coach_passenger                  int64
pedestrian_road_maintenance_worker      int64
casualty_type                           int64
casualty_home_area_type                 int64
casualty_imd_decile                     int64
Length: 74, dtype: object

In [24]:
df_traffic_accidents['accident_severity'].unique().tolist()

[2, 3, 1]

Eliminamos las columnas que no nos interesan.

In [25]:
df_traffic_accidents.drop(df_traffic_accidents.columns.difference(['accident_index','accident_referene','accident_severity','number_of_vehicles',
                                                                  'number_of_casualties','date','day_of_week','time', 
                                                                    'did_police_officer_attend_scene_of_accident','sex_of_driver',
                                                                  'age_of_driver','age_of_vehicle','generic_make_model',
                                                                  'sex_of_casualty','age_of_casualty','casualty_severity',
                                                                   'car_passenger']), 1, inplace=True)

In [28]:
df_traffic_accidents

Unnamed: 0,accident_index,accident_severity,number_of_vehicles,number_of_casualties,date,day_of_week,time,did_police_officer_attend_scene_of_accident,sex_of_driver,age_of_driver,age_of_vehicle,generic_make_model,sex_of_casualty,age_of_casualty,casualty_severity,car_passenger
0,2020010278554,2,2,1,09/11/2020,2,20:20,1,1,23,4,SUZUKI UK 110,1,23,2,0
1,2020010278556,2,2,1,04/11/2020,4,11:50,3,1,17,4,HONDA CBR125R,1,17,2,0
2,2020010278558,3,2,1,09/11/2020,2,19:31,1,1,45,0,YAMAHA GPD 125,1,45,3,0
3,2020010278559,3,2,1,09/11/2020,2,19:13,1,1,45,7,-1,1,45,3,0
4,2020010278561,2,1,1,09/11/2020,2,18:03,1,1,52,2,ALEXANDER DENNIS MODEL MISSING,1,42,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92445,2020990963785,2,1,1,11/07/2020,7,19:29,2,1,51,20,YAMAHA YZF R1,1,51,2,0
92446,2020990963795,3,2,1,27/05/2020,4,18:27,1,1,36,-1,-1,1,36,3,0
92447,2020990963798,3,1,1,11/07/2020,7,09:48,1,1,83,6,MITSUBISHI ASX,2,40,3,0
92448,2020990963826,2,3,1,11/07/2020,7,13:25,2,1,52,21,-1,1,52,2,0


In [27]:
df_traffic_accidents.dtypes

accident_index                                 object
accident_severity                               int64
number_of_vehicles                              int64
number_of_casualties                            int64
date                                           object
day_of_week                                     int64
time                                           object
did_police_officer_attend_scene_of_accident     int64
sex_of_driver                                   int64
age_of_driver                                   int64
age_of_vehicle                                  int64
generic_make_model                             object
sex_of_casualty                                 int64
age_of_casualty                                 int64
casualty_severity                               int64
car_passenger                                   int64
dtype: object