## This data science project aims to predict possibility of road accident and its severity. This will help road users to decide on a journey and driving habit.  The dataset contains various details like severity, collision type, injuries, fatality, location, etc. 

In [2]:
import pandas as pd
import numpy as np

### About dataset

This dataset is about past road accident and severity. The __data_collisions.csv__ data set includes details of 194672 accident records, location of accident and severity. It includes some of the following fields:

| Field          | Description                                                                           |
|----------------|---------------------------------------------------------------------------------------|
| Location       | location of collision                                                                 |
| Severity       | Code the correspond to the severity of the collision                                  |
| severity Desc  | A detailed description of the severity of the collision                               | 
| Ped count      | The number of pedestrians                                                             |
| SeriousInjuries| The number of serious injuries in the collision                                       |
| Injuries       | The number of total injuries in the collision                                         |
| Fatalities     | The number of fatalities in the collision                                             |
| Weather        | Wseather condition during the time of the collision                                   |
| COLLISIONTYPE  | collision type                                                                        |
| PERSONCOUNT    |The total number of people involved in the collision                                   |
| PEDCYLCOUNT    | The number of bicycles involved in the collision. This is entered by the state.       | 
| VEHCOUNT       | The number of vehicles involved in the collision.This is entered by the state.        |
| INCDATE        | The date of the incident                                                              |
| INCDTTM        | The date and time of the incident.                                                    |
| JUNCTIONTYPE   | Category of junction at which collision took place                                    |
| SDOT_COLCODE   | A code given to the collision by SDOT.                                                |
| SDOT_COLDESC   | A description of the collision corresponding to the collision code.                   |
| INATTENTIONIND | Whether or not collision was due to inattention. (Y/N                                 |
| UNDERINFL      | Whether or not a driver involved was under the influence of drugs or alcohol.         | 
| ROADCOND       | The condition of the road during the collision.                                       |
| LIGHTCOND      | The light conditions during the collision.                                            |
| PEDROWNOTGRNT  | Whether or not the pedestrian right of way was not granted. (Y/N)                     |
| SPEEDING       | Whether or not speeding was a factor in the collision. (Y/N)                          |
| CROSSWALKKEY   | A key for the crosswalk at which the collision occurred.                              |

### Load data from CSV File

In [6]:
df=pd.read_csv("Data-Collisions.csv", parse_dates=["INCDATE"])
df.info()
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194673 entries, 0 to 194672
Data columns (total 38 columns):
SEVERITYCODE      194673 non-null int64
X                 189339 non-null float64
Y                 189339 non-null float64
OBJECTID          194673 non-null int64
INCKEY            194673 non-null int64
COLDETKEY         194673 non-null int64
REPORTNO          194673 non-null object
STATUS            194673 non-null object
ADDRTYPE          192747 non-null object
INTKEY            65070 non-null float64
LOCATION          191996 non-null object
EXCEPTRSNCODE     84811 non-null object
EXCEPTRSNDESC     5638 non-null object
SEVERITYCODE.1    194673 non-null int64
SEVERITYDESC      194673 non-null object
COLLISIONTYPE     189769 non-null object
PERSONCOUNT       194673 non-null int64
PEDCOUNT          194673 non-null int64
PEDCYLCOUNT       194673 non-null int64
VEHCOUNT          194673 non-null int64
INCDATE           194673 non-null datetime64[ns, UTC]
INCDTTM           194673 

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [8]:
df.describe(include="all")

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
count,194673.0,189339.0,189339.0,194673.0,194673.0,194673.0,194673.0,194673,192747,65070.0,...,189661,189503,4667,114936.0,9333,194655.0,189769,194673.0,194673.0,194673
unique,,,,,,,194670.0,2,3,,...,9,9,1,,1,115.0,62,,,2
top,,,,,,,1782439.0,Matched,Block,,...,Dry,Daylight,Y,,Y,32.0,One parked--one moving,,,N
freq,,,,,,,2.0,189786,126926,,...,124510,116137,4667,,9333,27612.0,44421,,,187457
first,,,,,,,,,,,...,,,,,,,,,,
last,,,,,,,,,,,...,,,,,,,,,,
mean,1.298901,-122.330518,47.619543,108479.36493,141091.45635,141298.811381,,,,37558.450576,...,,,,7972521.0,,,,269.401114,9782.452,
std,0.457778,0.029976,0.056157,62649.722558,86634.402737,86986.54211,,,,51745.990273,...,,,,2553533.0,,,,3315.776055,72269.26,
min,1.0,-122.419091,47.495573,1.0,1001.0,1001.0,,,,23807.0,...,,,,1007024.0,,,,0.0,0.0,
25%,1.0,-122.348673,47.575956,54267.0,70383.0,70383.0,,,,28667.0,...,,,,6040015.0,,,,0.0,0.0,


initial obeservation shows that the data is imbalance. the maximum non-null column count is 194673 while the non-null column is 4667

In [10]:
df['SEVERITYCODE'].value_counts()

1    136485
2     58188
Name: SEVERITYCODE, dtype: int64

In [12]:
df['ROADCOND'].value_counts()

Dry               124510
Wet                47474
Unknown            15078
Ice                 1209
Snow/Slush          1004
Other                132
Standing Water       115
Sand/Mud/Dirt         75
Oil                   64
Name: ROADCOND, dtype: int64

#### code and descriptions provided by the state that describes the collision.

In [15]:
df['ST_COLDESC'].value_counts()

One parked--one moving                                                                   44421
Entering at angle                                                                        34674
From same direction - both going straight - one stopped - rear-end                       25771
Fixed object                                                                             13554
From same direction - both going straight - both moving - sideswipe                      12777
From opposite direction - one left turn - one straight                                   10324
From same direction - both going straight - both moving - rear-end                        7629
Vehicle - Pedalcyclist                                                                    4701
From same direction - all others                                                          4537
From same direction - one left turn - one straight                                        3093
From same direction - one right turn - one straigh

In [18]:
df2=df[['SEVERITYCODE','ROADCOND']]

In [19]:
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline