<h1>Coursera IBM Data Science Capstone</h1>
<h3>This notebook will be used for the IBM Certified Data Science Capstone through coursera.</h3>

## Table of contents
* [Introduction](#introduction)
* [Problem](#Problem)
* [Data](#Data)
* [Methodology](#Methodology)
* [Analysis](#Analysis)
* [Results](#Results)
* [Conclusion](#Conclusion)

## Introduction<a name="introduction"></a>

There is about 115 million cars that hit the road everyday. Inside each car is a driver who has their motivations to get places safely. As we innovate and adopt new technologies into our automobiles it has definetly gotten safer to drive. However accidents are still very much a sobering reminder of the dangers that still exist every time we make that trip. In order to limit risks on ourselves we require training for all drivers, wear seatbelts and drive carefully based on weather and road conditions.

## Problem <a name="Problem"></a>

We value the safety of every person on the road and want to limit the possibility of any severe or fatal car accidents. The objective of this project is to predict the severity of an accident based on different attributes in our data, such as road conditions, road-contour, light condition, etc. 

To provide the predicted severity of the accident a supervised machine learning classification model will be developed.

<h4>Target Audience</h4>

1. CDOT (Colorado Department of Transportation): By being able to predict the severity of an accident based on the conditions, CDOT can effectively plan road closures and utilize signs to warn drivers of the conditions.
2. First Responders: First responders can predict the sevirity of the accident they are responding to, allowing them to be better prepared when they arrive on scence.
3. Drivers in Denver: Drivers can predict the sevirity of accidents before leaving allowing them to alter travel plans if needed.

## Data <a name="Data"></a>

I will be performing analysis based on traffic accident data in the Denver area. The dataset I will be using is the from the open data catalog, specifically it is the Traffic Accidents - Comma-Separated Values provided here: 

https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-traffic-accidents

This dataset initially provided 44 attributes with 181,943 rows of motor vehicle crashes reported to the Denver Police Department, that occurred within the City and County of Denver and during the previous five calendar years plus the current year to date.

Below I have started the data exploration phase. I started by loading the csv into a dataframe, and looking at the head of the datframe. I then created my severity column based on the fatalities and seriously injured column. Lastly I looked at the shape of my data, as well as the data types and value counts of severity in my dataset. 

In [9]:
import pandas as pd
import numpy as np

df = pd.read_csv('https://www.denvergov.org/media/gis/DataCatalog/traffic_accidents/csv/traffic_accidents.csv', index_col = 0)
df.head(5)

Unnamed: 0_level_0,INCIDENT_ID,OFFENSE_ID,OFFENSE_CODE,OFFENSE_CODE_EXTENSION,TOP_TRAFFIC_ACCIDENT_OFFENSE,FIRST_OCCURRENCE_DATE,LAST_OCCURRENCE_DATE,REPORTED_DATE,INCIDENT_ADDRESS,GEO_X,...,TU2_VEHICLE_MOVEMENT,TU2_DRIVER_ACTION,TU2_DRIVER_HUMANCONTRIBFACTOR,TU2_PEDESTRIAN_ACTION,SERIOUSLY_INJURED,FATALITIES,FATALITY_MODE_1,FATALITY_MODE_2,SERIOUSLY_INJURED_MODE_1,SERIOUSLY_INJURED_MODE_2
OBJECTID_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,20193963.0,2019396354010,5401,0,TRAF - ACCIDENT - HIT & RUN,2019-01-02 20:50:00,,2019-01-02 21:01:00,W COLFAX AVE / N FEDERAL BLVD,3133517.0,...,GOING STRAIGHT,OTHER,NO APPARENT,,0.0,0.0,,,,
2,20193966.0,2019396654410,5441,0,TRAF - ACCIDENT,2019-01-02 20:59:00,,2019-01-02 20:59:00,N DELAWARE ST / W 8TH AVE,3142641.0,...,GOING STRAIGHT,OTHER,NO APPARENT,,0.0,0.0,,,,
3,20193991.0,2019399154010,5401,0,TRAF - ACCIDENT - HIT & RUN,2019-01-02 09:30:00,,2019-01-02 21:14:00,5000 E 33RD AVE,3160492.0,...,PARKED,,,,0.0,0.0,,,,
4,20194077.0,2019407754410,5441,0,TRAF - ACCIDENT,2019-01-02 22:03:00,,2019-01-02 22:46:00,W 48TH AVE / N BANNOCK ST,3143001.0,...,GOING STRAIGHT,OTHER,NO APPARENT,,0.0,0.0,,,,
5,20194189.0,2019418954200,5420,0,TRAF - ACCIDENT - DUI/DUID,2019-01-02 23:19:00,,2019-01-02 23:34:00,I70 HWYWB / N PECOS ST,3138717.0,...,STOPPED IN TRAFFIC,OTHER,NO APPARENT,,0.0,0.0,,,,


In [10]:
df['Severity'] = 0
def label_severity (row):
    if row['FATALITIES'] == 0 and row['SERIOUSLY_INJURED'] == 0 :
         return 0
    elif row['FATALITIES'] == 0 and row['SERIOUSLY_INJURED'] > 0 :
        return 1
    elif row['FATALITIES'] > 0:
        return 2
        
df['Severity'] = df.apply (lambda row: label_severity(row), axis=1)

In [11]:
df.shape

(181943, 45)

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 181943 entries, 1 to 181943
Data columns (total 45 columns):
INCIDENT_ID                      181943 non-null float64
OFFENSE_ID                       181943 non-null int64
OFFENSE_CODE                     181943 non-null int64
OFFENSE_CODE_EXTENSION           181943 non-null int64
TOP_TRAFFIC_ACCIDENT_OFFENSE     181943 non-null object
FIRST_OCCURRENCE_DATE            181943 non-null object
LAST_OCCURRENCE_DATE             616 non-null object
REPORTED_DATE                    181943 non-null object
INCIDENT_ADDRESS                 181943 non-null object
GEO_X                            174770 non-null float64
GEO_Y                            174770 non-null float64
GEO_LON                          174770 non-null float64
GEO_LAT                          174770 non-null float64
DISTRICT_ID                      179166 non-null float64
PRECINCT_ID                      174745 non-null float64
NEIGHBORHOOD_ID                  174745 non-null

In [13]:
df['Severity'].value_counts()

0.0    177512
1.0      3722
2.0       385
Name: Severity, dtype: int64

## Methodology <a name="Methodology"></a>

(Methodology Section)

## Analysis <a name="Analysis"></a>

(Analysis Section)

## Results <a name="Results"></a>

(Results Section)

## Conclusion <a name="Conclusion"></a>

(Conclusion Section)