<h1>Predicting the severity of car accidents in Seattle</h1>

<h2>
    The Business Problem
</h2>

<h4>
    The problem we have at hand is to predict the severity of accidents that occur in Seattle. The target for this problem is everybody who travels along the streets of Seattle. 
    The project will show them what factors may increase or decrease the potential severity of an accident. Knowing this information will allow them to take precautions to
    make sure that they are lowering the potential harm that they may come to. 
</h4>

<h2>
Description of the Data
</h2>

In [2]:
# import stuff

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure


In [3]:
# Pulling in the data

df = pd.read_csv("Data-Collisions.csv")
df.head()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [4]:
df.dtypes

SEVERITYCODE        int64
X                 float64
Y                 float64
OBJECTID            int64
INCKEY              int64
COLDETKEY           int64
REPORTNO           object
STATUS             object
ADDRTYPE           object
INTKEY            float64
LOCATION           object
EXCEPTRSNCODE      object
EXCEPTRSNDESC      object
SEVERITYCODE.1      int64
SEVERITYDESC       object
COLLISIONTYPE      object
PERSONCOUNT         int64
PEDCOUNT            int64
PEDCYLCOUNT         int64
VEHCOUNT            int64
INCDATE            object
INCDTTM            object
JUNCTIONTYPE       object
SDOT_COLCODE        int64
SDOT_COLDESC       object
INATTENTIONIND     object
UNDERINFL          object
WEATHER            object
ROADCOND           object
LIGHTCOND          object
PEDROWNOTGRNT      object
SDOTCOLNUM        float64
SPEEDING           object
ST_COLCODE         object
ST_COLDESC         object
SEGLANEKEY          int64
CROSSWALKKEY        int64
HITPARKEDCAR       object
dtype: objec


Given what we are trying to predict, there are a number of vairables that we can look at to try to predict various outcomes. However, looking at all of them would result in an unwieldy solution. So, I have decided to be focusing on the variables below:


<ul>
    <li>COLLISIONTYPE</li>
    <li>WEATHER</li>
    <li>SPEEDING</li>
    <li>ROADCOND</li>
    <li>LIGHTCOND</li>
    <li>SEVERITYCODE</li>
</ul>



Below we pull out the data we will be looking at

In [5]:
# Table with just the data that we need to use
df_data = df[['COLLISIONTYPE','WEATHER','SPEEDING','ROADCOND','LIGHTCOND','SEVERITYCODE']]
df_data.head(10)

Unnamed: 0,COLLISIONTYPE,WEATHER,SPEEDING,ROADCOND,LIGHTCOND,SEVERITYCODE
0,Angles,Overcast,,Wet,Daylight,2
1,Sideswipe,Raining,,Wet,Dark - Street Lights On,1
2,Parked Car,Overcast,,Dry,Daylight,1
3,Other,Clear,,Dry,Daylight,1
4,Angles,Raining,,Wet,Daylight,2
5,Angles,Clear,,Dry,Daylight,1
6,Angles,Raining,,Wet,Daylight,1
7,Cycles,Clear,,Dry,Daylight,2
8,Parked Car,Clear,,Dry,Daylight,1
9,Angles,Clear,,Dry,Daylight,2


Change the NaN values in SPEEDING to 'N' to signify no

In [6]:
df_data['SPEEDING'].fillna('N', inplace = True)
df_data

Unnamed: 0,COLLISIONTYPE,WEATHER,SPEEDING,ROADCOND,LIGHTCOND,SEVERITYCODE
0,Angles,Overcast,N,Wet,Daylight,2
1,Sideswipe,Raining,N,Wet,Dark - Street Lights On,1
2,Parked Car,Overcast,N,Dry,Daylight,1
3,Other,Clear,N,Dry,Daylight,1
4,Angles,Raining,N,Wet,Daylight,2
...,...,...,...,...,...,...
194668,Head On,Clear,N,Dry,Daylight,2
194669,Rear Ended,Raining,N,Wet,Daylight,1
194670,Left Turn,Clear,N,Dry,Daylight,2
194671,Cycles,Clear,N,Dry,Dusk,2


As a preliminary investigation, let's compare all of the variables to the severity to see if we can find any correlation between them.

In [7]:
df_Col = df_data[['SPEEDING','SEVERITYCODE']]
df_Col

Unnamed: 0,SPEEDING,SEVERITYCODE
0,N,2
1,N,1
2,N,1
3,N,1
4,N,2
...,...,...
194668,N,2
194669,N,1
194670,N,2
194671,N,2


In [8]:
#grab the data that shows whether speeding was involved in the accident or not
speeding_severity_totals = df_Col.groupby(["SPEEDING","SEVERITYCODE"])["SEVERITYCODE"].count().reset_index(name="count")

# Get the total number of speeding/not speeding
n_tot = speeding_severity_totals["count"][0] + speeding_severity_totals["count"][1]
y_tot = speeding_severity_totals["count"][2] + speeding_severity_totals["count"][3]

#normalize the data
speeding_severity_totals

Unnamed: 0,SPEEDING,SEVERITYCODE,count
0,N,1,130683
1,N,2,54657
2,Y,1,5802
3,Y,2,3531


In [11]:
speeding_severity_totals.loc(speeding_severity_totals["SPEEDING"] == "N")

TypeError: 'Series' objects are mutable, thus they cannot be hashed