# Capstone Project - Predicting Severity of an Accident

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this notebook, we will try to predict the severity of an accident using fairly easily acquirable information like
Road, Weather and Light Conditions to name a few. 

</p>Around the world, early warning systems are in place that work to predict and notify the population of an impending disaster, ahead of time. The problem defined here is to apply the same concept to predicting the probability of an Accident and it’s severity. The aim is to predict an accident using weather and local conditions, in addition to some personal/specific features.

This model is targeted towards Road and Safety Authorities as well the common user.The envisioned application is for the common user to receive prior warning about which roads to avoid or whether particular roads on the calculated route have a higher chance of accidents and warrants further attention and precautions.</p>

## Data <a name = "data"> </a>

</p>There are numerous datasets available on Accident Data,collected and hosted on government  websites. The particular dataset used for this model is UK Road Safety : Traffic Accidents and Vehicles sourced from Kaggle.

The dataset comprises of two files :
    1. AccidentInformation.csv: every line in the file represents a unique traffic accident (identified by the Accident_Index column), featuring various properties related to the accident as columns. Date range: 2005-2017
    2. Vehicle_Information.csv: every line in the file represents the involvement of a unique vehicle in a unique traffic     accident, featuring various vehicle and passenger properties as columns. Date range: 2004-2016
The two above-mentioned datasets can be linked through the unique traffic accident identifier (Accident_Index column).</p>

</p>I have selected attributes that describe the conditions prevalent during the occurrence of the accident, namely, Weather, Light, local environment attributes in addition to personal data like Vehicular and Driver data fields. 
With an iterative process of Visualization and modelling, the following features were finally chosen:


    1. Light Conditions
    2. Road Surface Conditions
    3. Weather Conditions
    4. Road Type
    5. Sex of Driver
    6. Day of the Week 
    7. Speed Limit
    8. Driver Age Band
    9. Urban or Rural Area 
    10. Junction Detail
    11. Vehicle Type
</p>

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

In [6]:
#read Accident information into DataFrame
df_Acc = pd.read_csv("Accident_Information.csv")
df_Acc.head()

#read Vehicle Information into DataFrame
df_Veh = pd.read_csv("Vehicle_Information.csv",engine = 'python')
df_Veh.head()

print(df_Acc.shape)
print(df_Veh.shape)

#merging the two datasets inner join
df = pd.merge(df_Acc,df_Veh, how = 'inner', on = 'Accident_Index')
df.shape

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


(2047256, 34)
(2177205, 24)


(2058408, 57)

In [7]:
df.dtypes

Accident_Index                                  object
1st_Road_Class                                  object
1st_Road_Number                                float64
2nd_Road_Class                                  object
2nd_Road_Number                                float64
Accident_Severity                               object
Carriageway_Hazards                             object
Date                                            object
Day_of_Week                                     object
Did_Police_Officer_Attend_Scene_of_Accident    float64
Junction_Control                                object
Junction_Detail                                 object
Latitude                                       float64
Light_Conditions                                object
Local_Authority_(District)                      object
Local_Authority_(Highway)                       object
Location_Easting_OSGR                          float64
Location_Northing_OSGR                         float64
Longitude 