# Predicting The Severity of Car Accidents in Seattle

## Introduction

Car accidents around the world occur very frequently, many of them are due to road conditions and the environment. The idea is to have a tool that, based on the conditions of the cart, the weather and the drivers; can give to know the possibility of having an accident as well as its severity, seeking to change your travel plan or if not possible, be more attentive and careful.
For this objective, we are going to work with the data of the city of Seattle, which will allow us to put together a model that will give us the best results. This data does not disclose information on past accidents based on road conditions, weather, light, driver attention, drug or alcohol use, and speed.
At the end, a report will be presented in which the model that meets the proposed objective will be presented, in addition to considering results based on the possibility of inappropriate behaviors (lack of attention, going too fast or consuming drugs or alcohol) on the part of the drivers, this last point seeks to alert drivers to their behavior as well as that of others.

## Imports

In [2]:
%matplotlib inline
import pandas as pd
import numpy as np

## Data Understanding

The dataset to use is about collisions recorded in the city of Seattle. Its attributes are as follows:

<table>
    <tr>
        <th>Attribute</th>
        <th>Data type, length</th>
        <th>Description</th>
    </tr>
<tr><td>OBJECTID</td><td>ObjectID</td><td>ESRI unique identifier</td></tr>
<tr><td>SHAPE</td><td>Geometry</td><td>ESRI geometry field</td></tr>
<tr><td>INCKEY</td><td>Long</td><td>A unique key for the incident</td></tr>
<tr><td>COLDETKEY</td><td>Long</td><td>Secondary key for the incident</td></tr>
<tr><td>ADDRTYPE</td><td>Text, 12</td><td>Collision address type:
• Alley
• Block
• Intersection</td></tr>
<tr><td>INTKEY</td><td>Double</td><td>Key that corresponds to the intersection associated with a collision </td></tr>
<tr><td>LOCATION</td><td>Text, 255</td><td>Description of the general location of the collision</td></tr>
<tr><td>EXCEPTRSNCODE</td><td>Text, 10</td><td></td></tr>
<tr><td>EXCEPTRSNDESC</td><td>Text, 300</td><td></td></tr>
<tr><td>SEVERITYCODE</td><td>Text, 100</td><td>A code that corresponds to the severity of the
collision:
• 3—fatality
• 2b—serious injury
• 2—injury
• 1—prop damage
• 0—unknown</td></tr>
<tr><td>SEVERITYDESC</td><td>Text</td><td>A detailed description of the severity of the collision</td></tr>
<tr><td>COLLISIONTYPE</td><td>Text, 300</td><td>Collision type</td></tr>
<tr><td>PERSONCOUNT</td><td>Double</td><td>The total number of people involved in the collision</td></tr>
<tr><td>PEDCOUNT</td><td>Double</td><td>The number of pedestrians involved in the collision. This is entered by the state.</td></tr>
<tr><td>PEDCYLCOUNT</td><td>Double</td><td>The number of bicycles involved in the collision. This is entered by the state.</td></tr>
<tr><td>VEHCOUNT</td><td>Double</td><td>The number of vehicles involved in the collision. This is entered by the state.</td></tr>
<tr><td>INJURIES</td><td>Double</td><td>The number of total injuries in the collision. This is entered by the state.</td></tr>
<tr><td>SERIOUSINJURIES</td><td>Double</td><td>The number of serious injuries in the collision. This is entered by the state.</td></tr>
<tr><td>FATALITIES</td><td>Double</td><td>The number of fatalities in the collision. This is entered by the state.</td></tr>
<tr><td>INCDATE</td><td>Date</td><td>The date of the incident.</td></tr>
<tr><td>INCDTTM</td><td>Text, 30</td><td>The date and time of the incident.</td></tr>
<tr><td>JUNCTIONTYPE</td><td>Text, 300</td><td>Category of junction at which collision took place</td></tr>
<tr><td>SDOT_COLCODE</td><td>Text, 10</td><td>A code given to the collision by SDOT.</td></tr>
<tr><td>SDOT_COLDESC</td><td>Text, 300</td><td>A description of the collision corresponding to the collision code.</td></tr>
<tr><td>INATTENTIONIND</td><td>Text, 1</td><td>Whether or not collision was due to inattention. (Y/N)</td></tr>
<tr><td>UNDERINFL</td><td>Text, 10</td><td>Whether or not a driver involved was under the influence of drugs or alcohol.</td></tr>
<tr><td>WEATHER</td><td>Text, 300</td><td>A description of the weather conditions during the time of the collision.</td></tr>
<tr><td>ROADCOND</td><td>Text, 300</td><td>The condition of the road during the collision.</td></tr>
<tr><td>LIGHTCOND</td><td>Text, 300</td><td>The light conditions during the collision.</td></tr>
<tr><td>PEDROWNOTGRNT</td><td>Text, 1</td><td>Whether or not the pedestrian right of way was not granted. (Y/N)</td></tr>
<tr><td>SDOTCOLNUM</td><td>Text, 10</td><td>A number given to the collision by SDOT.</td></tr>
<tr><td>SPEEDING</td><td>Text, 1</td><td>Whether or not speeding was a factor in the collision. (Y/N)</td></tr>
<tr><td>ST_COLCODE</td><td>Text, 10</td><td>A code provided by the state that describes the collision. For more information about these codes, please see the State Collision Code Dictionary.</td></tr>
<tr><td>ST_COLDESC</td><td>Text, 300</td><td>A description that corresponds to the state’s coding designation.</td></tr>
<tr><td>SEGLANEKEY</td><td>Long</td><td>A key for the lane segment in which the collision occurred.</td></tr>
<tr><td>CROSSWALKKEY</td><td>Long</td><td>A key for the crosswalk at which the collision occurred.</td></tr>
<tr><td>HITPARKEDCAR</td><td>Text, 1</td><td>Whether or not the collision involved hitting a parked car. (Y/N)</td></tr>
</table>

### Download the dataset and load data from CSV file:

In [1]:
!wget -O Data-Collisions.csv https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv

--2020-08-30 08:16:36--  https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv
Resolving s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)... 67.228.254.196
Connecting to s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73917638 (70M) [text/csv]
Saving to: ‘Data-Collisions.csv’


2020-08-30 08:16:39 (20.2 MB/s) - ‘Data-Collisions.csv’ saved [73917638/73917638]



In [3]:
df = pd.read_csv('Data-Collisions.csv')
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [4]:
df.shape

(194673, 38)

### Review the dataset

Let's see how many null values exist in our data set

In [12]:
df.isnull().sum()

SEVERITYCODE           0
X                   5334
Y                   5334
OBJECTID               0
INCKEY                 0
COLDETKEY              0
REPORTNO               0
STATUS                 0
ADDRTYPE            1926
INTKEY            129603
LOCATION            2677
EXCEPTRSNCODE     109862
EXCEPTRSNDESC     189035
SEVERITYCODE.1         0
SEVERITYDESC           0
COLLISIONTYPE       4904
PERSONCOUNT            0
PEDCOUNT               0
PEDCYLCOUNT            0
VEHCOUNT               0
INCDATE                0
INCDTTM                0
JUNCTIONTYPE        6329
SDOT_COLCODE           0
SDOT_COLDESC           0
INATTENTIONIND    164868
UNDERINFL           4884
WEATHER             5081
ROADCOND            5012
LIGHTCOND           5170
PEDROWNOTGRNT     190006
SDOTCOLNUM         79737
SPEEDING          185340
ST_COLCODE            18
ST_COLDESC          4904
SEGLANEKEY             0
CROSSWALKKEY           0
HITPARKEDCAR           0
dtype: int64

Let’s see how many of each severity code is in our data set

In [14]:
df['SEVERITYCODE'].value_counts()

1    136485
2     58188
Name: SEVERITYCODE, dtype: int64

We can see that there are only 2 of the 5 codes

Let's see how many inattention accidents are in our data set

In [16]:
df['INATTENTIONIND'].value_counts()

Y    29805
Name: INATTENTIONIND, dtype: int64

It can be seen that most of the lines do not consider the driver's attention.

Let's see how many substance-related accidents are in our data set

In [17]:
df['UNDERINFL'].value_counts()

N    100274
0     80394
Y      5126
1      3995
Name: UNDERINFL, dtype: int64

We can see that although most have values, it is necessary to normalize them.

Let’s see how many of each weather is in our data set

In [19]:
df['WEATHER'].value_counts()

Clear                       111135
Raining                      33145
Overcast                     27714
Unknown                      15091
Snowing                        907
Other                          832
Fog/Smog/Smoke                 569
Sleet/Hail/Freezing Rain       113
Blowing Sand/Dirt               56
Severe Crosswind                25
Partly Cloudy                    5
Name: WEATHER, dtype: int64

Let's see how many road conditions are in our data set.

In [20]:
df['ROADCOND'].value_counts()

Dry               124510
Wet                47474
Unknown            15078
Ice                 1209
Snow/Slush          1004
Other                132
Standing Water       115
Sand/Mud/Dirt         75
Oil                   64
Name: ROADCOND, dtype: int64

Let's see how many light conditions there are in our data set.

In [21]:
df['LIGHTCOND'].value_counts()

Daylight                    116137
Dark - Street Lights On      48507
Unknown                      13473
Dusk                          5902
Dawn                          2502
Dark - No Street Lights       1537
Dark - Street Lights Off      1199
Other                          235
Dark - Unknown Lighting         11
Name: LIGHTCOND, dtype: int64

Let's see how many recorded speed conditions in our data set

In [22]:
df['SPEEDING'].value_counts()

Y    9333
Name: SPEEDING, dtype: int64

Let's see how many collision address type in our data set

In [23]:
df['ADDRTYPE'].value_counts()

Block           126926
Intersection     65070
Alley              751
Name: ADDRTYPE, dtype: int64