# The Premonition Project

## Introduction

### It is a nice sunny morning. You might be tempted to go for a drive. Hold on! Do you know that more than 90 people die in accidents everyday in the US? Try as one may, it is a truth that accidents happen and there in no solution in the visible future that can prevent all accidents. But can accidents be predicted? 
### In this project, I will explore the various risk factors that contribute to the occurence of an accident. This will help drivers be wary of the most potent contributors and take precautions beforehand. I will also be examining the specific parts of the car that are likely to be involved in the most severe accidents. This will give an insight to car designers about any structural changes that may be needed to the parts at risk to increase safety of a car.


### Specifically, I will attempt to answer questions regarding the relation of accidents both in occurence and severity to factors such as the location, road conditions, weather conditions and collision type. An attempt will also be made to unearth any other hidden causes of accidents. Such variety of insights will be useful to a number of stakeholders such as passengers, manufacturers, insurers among others.  

## Data

### We will be using the following dataset:

In [309]:
#importing libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("Data-Collisions.csv")
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [310]:
#Exploring the data type of columns in data
df.dtypes

SEVERITYCODE        int64
X                 float64
Y                 float64
OBJECTID            int64
INCKEY              int64
COLDETKEY           int64
REPORTNO           object
STATUS             object
ADDRTYPE           object
INTKEY            float64
LOCATION           object
EXCEPTRSNCODE      object
EXCEPTRSNDESC      object
SEVERITYCODE.1      int64
SEVERITYDESC       object
COLLISIONTYPE      object
PERSONCOUNT         int64
PEDCOUNT            int64
PEDCYLCOUNT         int64
VEHCOUNT            int64
INCDATE            object
INCDTTM            object
JUNCTIONTYPE       object
SDOT_COLCODE        int64
SDOT_COLDESC       object
INATTENTIONIND     object
UNDERINFL          object
WEATHER            object
ROADCOND           object
LIGHTCOND          object
PEDROWNOTGRNT      object
SDOTCOLNUM        float64
SPEEDING           object
ST_COLCODE         object
ST_COLDESC         object
SEGLANEKEY          int64
CROSSWALKKEY        int64
HITPARKEDCAR       object
dtype: objec

### The various columns of the data will be used either on their own or in certain combinations along with the scale of severity of accidents. Starting with exploratory data analysis to investigate relation between variables, I will move on to correlation analysis. I will also attempt to build various models to best predict the scale of accident and likelihood and based on results select the model that suits the case the best. I will end with a list of recommendations based on the overall analysis and the indications from the model.

## Methodology

### Data Pre-processing

In [311]:
#Dropping unwanted columns to simplify dataset
columns_to_drop=['SEVERITYCODE.1','X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO','STATUS','INTKEY', 'LOCATION', 'EXCEPTRSNCODE',
       'EXCEPTRSNDESC','INCDATE','INCDTTM', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC','INATTENTIONIND','PEDROWNOTGRNT', 'SDOTCOLNUM','ST_COLCODE', 'ST_COLDESC',
       'SEGLANEKEY', 'CROSSWALKKEY', 'HITPARKEDCAR','PEDCOUNT','PEDCYLCOUNT','SEVERITYCODE']
df.drop(columns_to_drop, axis=1, inplace=True)
df.dtypes #df that will be used ahead

ADDRTYPE         object
SEVERITYDESC     object
COLLISIONTYPE    object
PERSONCOUNT       int64
VEHCOUNT          int64
UNDERINFL        object
WEATHER          object
ROADCOND         object
LIGHTCOND        object
SPEEDING         object
dtype: object

In [312]:
#Dealing with missing values:
df["ADDRTYPE"].fillna("Other", inplace=True)
df["COLLISIONTYPE"].fillna("Other",inplace=True)
df["ROADCOND"].fillna("Unknown",inplace=True)
df["WEATHER"].fillna("Unknown",inplace=True)
df["LIGHTCOND"].fillna("Unknown",inplace=True)
df["SPEEDING"].fillna("N",inplace=True)

#Dealing with "UNDERINFL" column:
df["UNDERINFL"].fillna("N",inplace=True)
df["UNDERINFL"]=df["UNDERINFL"].replace(["0"],"N")
df["UNDERINFL"]=df["UNDERINFL"].replace(["1"],"Y")

### Exploratory Data Analysis

In [313]:
table=pd.pivot_table(df,index=["ADDRTYPE","SEVERITYDESC"],values=["COLLISIONTYPE"],aggfunc={len})
table=pd.DataFrame(table)
table.columns=["Accidents"]

totals=[table["Accidents"][0]+table["Accidents"][1],table["Accidents"][0]+table["Accidents"][1],table["Accidents"][2]+table["Accidents"][3],table["Accidents"][2]+table["Accidents"][3],table["Accidents"][4]+table["Accidents"][5],table["Accidents"][4]+table["Accidents"][5],table["Accidents"][6]+table["Accidents"][7],table["Accidents"][6]+table["Accidents"][7]]
table["Totals"]=totals

table["Percentage"]=round((table["Accidents"]/table["Totals"])*100,2)
table.drop(["Totals","Accidents"],axis=1,inplace=True)
table

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
ADDRTYPE,SEVERITYDESC,Unnamed: 2_level_1
Alley,Injury Collision,10.92
Alley,Property Damage Only Collision,89.08
Block,Injury Collision,23.71
Block,Property Damage Only Collision,76.29
Intersection,Injury Collision,42.75
Intersection,Property Damage Only Collision,57.25
Other,Injury Collision,9.92
Other,Property Damage Only Collision,90.08


#### 1. As evident from above table, accidents at intersections are more likely to be severe with 42.75% accidents at intersections resulting in injury. Accidents in alleys are relatively less dangerous with 89.08% accidents resulting in property damage only.

In [314]:
table=pd.pivot_table(df,index=["COLLISIONTYPE","SEVERITYDESC"],values=["ROADCOND"],aggfunc={len})
table=pd.DataFrame(table)
table.columns=["Accidents"]

totals=[]
for i in (0,2,4,6,8,10,12,14,16,18):
    totals.append(table["Accidents"][i]+table["Accidents"][i+1])
l=[]
for k in totals:
    l.append(k)
    l.append(k)

table["Totals"]=l

table["Percentage"]=round(table["Accidents"]/table["Totals"]*100,2)
table.drop(["Accidents","Totals"], axis=1 ,inplace=True)
table

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
COLLISIONTYPE,SEVERITYDESC,Unnamed: 2_level_1
Angles,Injury Collision,39.29
Angles,Property Damage Only Collision,60.71
Cycles,Injury Collision,87.61
Cycles,Property Damage Only Collision,12.39
Head On,Injury Collision,43.08
Head On,Property Damage Only Collision,56.92
Left Turn,Injury Collision,39.49
Left Turn,Property Damage Only Collision,60.51
Other,Injury Collision,25.0
Other,Property Damage Only Collision,75.0


#### 2. As evident, accidents with pedestrians or cyclists being hit have a very high likelihood of causing injuries while accidents involving parked cars and sideswipes tend to damage property only

#### 3. From the data, it is seen that speeding and being underinfluence of alcohol are very highly correlated. Both these types of accidents result in high percentage of severe accidents though such cases are uncommon on the whole.

In [319]:
table=pd.pivot_table(df,index=["ROADCOND","SEVERITYDESC"],values=["SPEEDING"],aggfunc={len})
table=pd.DataFrame(table)
table.columns=["Accidents"]
table

Unnamed: 0_level_0,Unnamed: 1_level_0,Accidents
ROADCOND,SEVERITYDESC,Unnamed: 2_level_1
Dry,Injury Collision,40064
Dry,Property Damage Only Collision,84446
Ice,Injury Collision,273
Ice,Property Damage Only Collision,936
Oil,Injury Collision,24
Oil,Property Damage Only Collision,40
Other,Injury Collision,43
Other,Property Damage Only Collision,89
Sand/Mud/Dirt,Injury Collision,23
Sand/Mud/Dirt,Property Damage Only Collision,52


#### 4. Accidents on wet roads tend to be slightly more likely to be severe compared to those on dry roads. Other conditions are much less common to infer anything.

In [323]:
table=pd.pivot_table(df,index=["LIGHTCOND","SEVERITYDESC"],values=["SPEEDING"],aggfunc={len})
table=pd.DataFrame(table)
table.columns=["Accidents"]
table

Unnamed: 0_level_0,Unnamed: 1_level_0,Accidents
LIGHTCOND,SEVERITYDESC,Unnamed: 2_level_1
Dark - No Street Lights,Injury Collision,334
Dark - No Street Lights,Property Damage Only Collision,1203
Dark - Street Lights Off,Injury Collision,316
Dark - Street Lights Off,Property Damage Only Collision,883
Dark - Street Lights On,Injury Collision,14475
Dark - Street Lights On,Property Damage Only Collision,34032
Dark - Unknown Lighting,Injury Collision,4
Dark - Unknown Lighting,Property Damage Only Collision,7
Dawn,Injury Collision,824
Dawn,Property Damage Only Collision,1678


### Results

#### From the above analysis, we may conclude that the following are the main risk factors of severe accidents:
#### 1. Accidents on intersections
#### 2. Accidents involving pedestrians, cyclists being hit
#### 3. Speeding
#### 4. Being under influence 
#### 5. Wet Roads
#### 6. Dark conditions