## Applied Data Science Capstone

### Introduction/Business Problem

The Seattle government is concerned with the severity and number of accidents and wants to employ actions based on the analysis of historical data to alert drivers of the imminence of accidents.

This study aims to predict the severity of an accident, given the locations, weather and road conservation.

The analysis aims to determine a set of possible causes that contribute to the increase in the severity of accidents to allow preventive actions by road users.

The targets audience of the project are drivers, rescue groups, police and insurance companys.

It's expected to achieve a reduction in the number and severity of accidents to make drivers and passengers more secure.

### About dataset

This dataset is about collisions that occurred between 2004 and 2020 in the city of Seattle. The __Data-Collisions.csv__ data set includes details of 194673 collisions provided by the Seattle Department of Transportation Traffic Management Division.

It includes following fields:

| Field          | Description                                                                           |
|----------------|---------------------------------------------------------------------------------------|
| OBJECTID       | Unique identifier                                                                     |
| ADDRTYPE       | Collision address type                                                                |
|                |    (Alley/Block/Intersection)                                                         |
| SEVERITYCODE   | A code that corresponds to the severity of the collision                              |
|                |    (3—fatality / 2b—serious injury / 2—injury / 1—prop damage / 0—unknown)            |         
| (X, Y)         | Spatial characteristics                                                               |
| INTKEY         | Key that corresponds to the intersection associated with a collision                  |
| LOCATION       | Description of the general location of the collision                                  |
| SEVERITYDESC   | A detailed description of the severity of the collision                               |
| COLLISIONTYPE  | Collision type                                                                        |
| PERSONCOUNT    | The total number of people involved in the collision                                  |
| PEDCOUNT       | The number of pedestrians involved in the collision                                   |
| PEDCYLCOUNT    | The number of bicycles involved in the collision                                      |
| VEHCOUNT       | The number of vehicles involved in the collision                                      |
| INCDTTM        | The date and time of the incident                                                     |
| JUNCTIONTYPE	 | Category of junction at which collision took place                                    |
| SDOT_COLCODE	 | A code given to the collision by SDOT                                                 |
| SDOT_COLDESC	 | A description of the collision corresponding to the collision code                    |
| INATTENTIONIND | Whether or not collision was due to inattention                                       |
| UNDERINFL	     | Whether or not a driver involved was under the influence of drugs or alcohol          |
| WEATHER	     | A description of the weather conditions during the time of the collision.             |
| ROADCOND       | The condition of the road during the collision                                        |
| LIGHTCOND	     | The light conditions during the collision                                             |
| PEDROWNOTGRNT  | Whether or not the pedestrian right of way was not granted (Y/N)                      |
| SPEEDING       |	Whether or not speeding was a factor in the collision (Y/N)                          |
| SEGLANEKEY     | A key for the lane segment in which the collision occurred                            |	
| CROSSWALKKEY   | A key for the crosswalk at which the collision occurred                               |
| HITPARKEDCAR   | Whether or not the collision involved hitting a parked car (Y/N)                      |

### Methodology

### Reading the Data 

Download the dataset

In [1]:
import pandas as pd
url = 'https://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv'

Load Data From CSV File

In [2]:
df = pd.read_csv(url, low_memory=False)
df.head()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,LOCATION,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,-122.344896,47.717173,1,1003,1003,3503158,Matched,Block,,AURORA AVE N BETWEEN N 117TH PL AND N 125TH ST,...,Dry,Daylight,Y,,,45.0,Vehicle - Pedalcyclist,15057,0,N
1,-122.376467,47.543774,2,56200,56200,1795087,Matched,Block,,35TH AVE SW BETWEEN SW MORGAN ST AND SW HOLLY ST,...,Dry,Dark - Street Lights On,,6015003.0,,0.0,Vehicle going straight hits pedestrian,0,0,N
2,-122.360735,47.701487,3,327037,328537,E979380,Matched,Intersection,37122.0,3RD AVE NW AND NW 100TH ST,...,Wet,Daylight,,,,10.0,Entering at angle,0,0,N
3,-122.297415,47.599233,4,327278,328778,E996362,Unmatched,Intersection,30602.0,M L KING JR WAY S AND S JACKSON ST,...,,,,,,,,0,0,N
4,-122.368001,47.653585,5,1248,1248,3645424,Unmatched,Block,,W EWING ST BETWEEN 6TH AVE W AND W EWING PL,...,,,,,,,,0,0,N


In [3]:
df.shape

(221525, 40)

In [4]:
df['SEVERITYCODE'].value_counts()

1     137671
2      58783
0      21615
2b      3105
3        350
Name: SEVERITYCODE, dtype: int64

Severity of the collision
3  — fatality: 350
2b — serious injury: 3105
2  — injury: 58783
1  — prop damage: 137671
0  — unknown: 21615

In [9]:
df['ADDRTYPE'].value_counts()

Block           144999
Intersection     71936
Alley              878
Name: ADDRTYPE, dtype: int64

Collision address type
Block:          144999
Intersection:    71936
Alley:             878

In [11]:
df['COLLISIONTYPE'].value_counts()

Parked Car    48551
Angles        35573
Rear Ended    34691
Other         24588
Sideswipe     18891
Left Turn     14115
Pedestrian     7666
Cycles         5932
Right Turn     3017
Head On        2188
Name: COLLISIONTYPE, dtype: int64

Collision type
Parked Car:    48551
Angles:        35573
Rear Ended:    34691
Other:         24588
Sideswipe:     18891
Left Turn:     14115
Pedestrian:     7666
Cycles:         5932
Right Turn:     3017
Head On:        2188

In [12]:
df['INATTENTIONIND'].value_counts()

Y    30188
Name: INATTENTIONIND, dtype: int64

In [13]:
df['UNDERINFL'].value_counts()

N    103927
0     81676
Y      5399
1      4230
Name: UNDERINFL, dtype: int64

In [14]:
df['WEATHER'].value_counts()

Clear                       114738
Raining                      34036
Overcast                     28552
Unknown                      15131
Snowing                        919
Other                          860
Fog/Smog/Smoke                 577
Sleet/Hail/Freezing Rain       116
Blowing Sand/Dirt               56
Severe Crosswind                26
Partly Cloudy                   10
Blowing Snow                     1
Name: WEATHER, dtype: int64

In [15]:
df['ROADCOND'].value_counts()

Dry               128588
Wet                48734
Unknown            15139
Ice                 1232
Snow/Slush          1014
Other                136
Standing Water       119
Sand/Mud/Dirt         77
Oil                   64
Name: ROADCOND, dtype: int64

In [16]:
df['LIGHTCOND'].value_counts()

Daylight                    119492
Dark - Street Lights On      50133
Unknown                      13532
Dusk                          6082
Dawn                          2609
Dark - No Street Lights       1579
Dark - Street Lights Off      1239
Other                          244
Dark - Unknown Lighting         23
Name: LIGHTCOND, dtype: int64

In [17]:
df['SPEEDING'].value_counts()

Y    9929
Name: SPEEDING, dtype: int64

In [18]:
df['SEGLANEKEY'].value_counts()

0        218489
6532         19
6078         19
12162        18
10336        15
          ...  
9803          1
14281         1
4178          1
6355          1
9402          1
Name: SEGLANEKEY, Length: 2101, dtype: int64

In [19]:
df['CROSSWALKKEY'].value_counts()

0         217283
523609        19
520838        15
524265        13
525567        13
           ...  
525111         1
523080         1
521033         1
523208         1
521927         1
Name: CROSSWALKKEY, Length: 2343, dtype: int64