# Introduction

When involved in a collision, cyclists are six times more likely to incur injury than solely property damage. Compared to situations involving only cars, inuries only occur once of out every four instances. Therefore persons are more than twice as likely to be injured on a bicycle than if they were in another vehicle.

There are many things cyclists can do to avoid collisions, but inevitably other factors remain compeltely out of their control. This model seeks to calculate the propensity of factors like Weather, Road and Light Condition and others in contributing to collisions that injure cyclists. Cyclists can then better judge based on the situation as to when they should potentially consider a different form of transporation (if possible) or seek to be more on guard towards a particular variable of the road.

# Data Section

#### The data set being used for analysis is from the Seattle Department of Transporation cataloging collisions from 2004 to the present.

In [1]:
url = 'https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv'

In [2]:
import pandas as pd

data = pd.read_csv(url)
data.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [17]:
print(data.shape)
data.columns

(194673, 38)


Index(['SEVERITYCODE', 'X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO',
       'STATUS', 'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE',
       'EXCEPTRSNDESC', 'SEVERITYCODE.1', 'SEVERITYDESC', 'COLLISIONTYPE',
       'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INCDATE',
       'INCDTTM', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC',
       'INATTENTIONIND', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND',
       'PEDROWNOTGRNT', 'SDOTCOLNUM', 'SPEEDING', 'ST_COLCODE', 'ST_COLDESC',
       'SEGLANEKEY', 'CROSSWALKKEY', 'HITPARKEDCAR'],
      dtype='object')

#### The data is filtered down to only the essential attributes required for the model.

In [30]:
filter_data = data[['SEVERITYCODE','OBJECTID','COLLISIONTYPE','PERSONCOUNT','PEDCOUNT','PEDCYLCOUNT','VEHCOUNT','INCDTTM','JUNCTIONTYPE','ROADCOND','LIGHTCOND','WEATHER','ST_COLCODE']]

#### The data is then further filtered to only include collisions involving a bicycle.

In [31]:
bike_data = filter_data[filter_data['PEDCYLCOUNT'] > 0]
bike_data.head()

Unnamed: 0,SEVERITYCODE,OBJECTID,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INCDTTM,JUNCTIONTYPE,ROADCOND,LIGHTCOND,WEATHER,ST_COLCODE
7,2,9,Cycles,3,0,1,1,4/15/2020 5:47:00 PM,At Intersection (intersection related),Dry,Daylight,Clear,5
25,2,34,Cycles,2,0,1,1,4/25/2019 9:40:00 AM,Mid-Block (not related to intersection),Dry,Daylight,Clear,5
52,2,62,Cycles,3,0,1,1,3/29/2013 11:53:00 AM,At Intersection (intersection related),Dry,Unknown,Clear,45
79,1,91,Cycles,2,0,1,1,3/28/2013 3:30:00 PM,At Intersection (intersection related),Dry,Daylight,Clear,45
90,2,103,Other,1,0,1,0,10/11/2004 4:00:00 PM,Mid-Block (but intersection related),Dry,Daylight,Clear,52


#### Grouping the data by the various attributes that will contribute to the model show where certain feature data may need to be combined or re-engineered to produce more accurate results.

In [44]:
print(bike_data.shape)
bike_data[['SEVERITYCODE','OBJECTID']].groupby(['SEVERITYCODE']).count()

(5484, 13)


Unnamed: 0_level_0,OBJECTID
SEVERITYCODE,Unnamed: 1_level_1
1,679
2,4805


In [40]:
bike_data[['SEVERITYCODE','WEATHER']].groupby(['WEATHER']).count().sort_values('SEVERITYCODE',ascending=False)

Unnamed: 0_level_0,SEVERITYCODE
WEATHER,Unnamed: 1_level_1
Clear,3953
Overcast,800
Raining,597
Unknown,113
Other,7
Fog/Smog/Smoke,5
Snowing,2
Blowing Sand/Dirt,1
Sleet/Hail/Freezing Rain,1


In [39]:
bike_data[['SEVERITYCODE','ROADCOND']].groupby(['ROADCOND']).count().sort_values('SEVERITYCODE',ascending=False)

Unnamed: 0_level_0,SEVERITYCODE
ROADCOND,Unnamed: 1_level_1
Dry,4402
Wet,934
Unknown,122
Ice,10
Snow/Slush,4
Standing Water,3
Other,2
Sand/Mud/Dirt,2


In [41]:
bike_data[['SEVERITYCODE','JUNCTIONTYPE']].groupby(['JUNCTIONTYPE']).count().sort_values('SEVERITYCODE',ascending=False)

Unnamed: 0_level_0,SEVERITYCODE
JUNCTIONTYPE,Unnamed: 1_level_1
At Intersection (intersection related),3104
Mid-Block (not related to intersection),1487
Driveway Junction,570
Mid-Block (but intersection related),248
At Intersection (but not related to intersection),34
Ramp Junction,2


In [42]:
bike_data[['SEVERITYCODE','LIGHTCOND']].groupby(['LIGHTCOND']).count().sort_values('SEVERITYCODE',ascending=False)

Unnamed: 0_level_0,SEVERITYCODE
LIGHTCOND,Unnamed: 1_level_1
Daylight,4085
Dark - Street Lights On,917
Dusk,211
Dawn,107
Unknown,93
Dark - No Street Lights,33
Dark - Street Lights Off,29
Other,2
Dark - Unknown Lighting,1


### A pair of classification models for personal injury collisions (SEVERITYCODE - 2) vs property damage collisions (SEVERITYCODE - 1) will allow us to train an unified model that will illustrate the factors that cyclists should be more aware of before setting out on a journey.