# <center> Capstone Project </center> <br>

This Jupyter Notebook is created for a project for IBM course Applied Data Science Capstone on Coursera. The aim of the project is to work on a case study which is to predict the severity of an accident using machine learning models and data science techniques learned on previous courses. To do that we will try to gain some insight from the data: under what weather, conditions of the road and lighting the incidents occur more often? Are they more likely to happen in winter or summer? Does speeding increase the severity of the accident? We will try to answer such questions using data. Specifically, we will try to find out which factors have the most impact on severity of the accident, create machine learning algorithms and train them to predict the severity of a possible accident. <br>

The given problem arises due to the lack of warnings and information about the weather and road conditions for the drivers, so that they could drive more carefully or, if possible, even change their travel. Such warnings with predictions how severe the car accident could be, given the weather and road conditions, may reduce the number of accidents, casualties and injuries, which in turn reduce cost of damage. In order to develop methods reducing the damage of potential accidents it is crucial to determine factors leading to said accidents. To efficiently use limited resources the priority is to create strategies aimed at minimizing the risk of sever accidents, hence we try to find factors particulary leading to severe accidents. As the project's objective is to determine what causes more severe incidents, this information could be most useful for insurance companies and governments (so they could, for example, improve lighting conditions at certain intersections/roads), although it may possibly be useful for car manufacturers and road construction companies. 

## Case Study

The given data set *Data-Collisions.csv* contains all collissions provided by Seattle Police Department from 2004 to present. The attributes in the data set include:
- Time and Location
    - Coordinates of the collision **X** and **Y**
    - Describtion of the general location of the collision **LOCATION** 
    - Type of the location *(alley, block or intersection)* **ADDRTYPE**
    - Category of junction at which collision took place **JUNCTIONTYPE**
    - Key that corresponds to the intersection associated with a collision **INTKEY**
    - Key for the lane segment in which the collision occurred **SEGLANEKEY**
    - Key for the crosswalk at which the collision occurred **CROSSWALKKEY**
    - Date of the incident **INCDATE**
    - Date and time of the incident **INCDTTM**
- Conditions
    - Description of the weather conditions during the time of the collision **WEATHER**
    - Condition of the road during the collision **ROADCOND**
    - Light conditions during the collision **LIGHTCOND**
- Involved parties
    - Number of people **PERSONCOUNT**
    - Number of pedestrians **PEDCOUNT**
    - Number of bicycles **PEDCYLCOUNT**
    - Number of vehicles **VEHCOUNT** 
- Behaviour of involved parties
    - Whether or not speeding was a factor in the collision (Y/N) **SPEEDING**
    - Whether or not the collision involved hitting a parked car (Y/N) **HITPARKEDCAR**
    - Whether or not the pedestrian right of way was not granted (Y/N) **PEDROWNOTGRNT**
    - Whether or not collision was due to inattention (Y/N) **INATTENTIONIND**
    - Whether or not a driver involved was under the influence of drugs or alcohol **UNDERINFL**
- Details of the incident
    - Code given to the collision by SDOT **SDOT_COLCODE** (for more information see the [State Collision Code Dictionary](https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf))
    - Description of the collision corresponding to the collision code **SDOT_COLDESC**
    - Code provided by the state that describes the collision **ST_COLCODE**
    - Description that corresponds to the state’s coding designation **ST_COLDESC**
    - Number given to the collision by SDOT **SDOTCOLNUM**
    - Collision type **COLLISIONTYPE**
- Severity of the incident
    - Code that corresponds to the severity *(3—fatality, 2b—serious injury, 2—injury, 1—property damage, 0—unknown)* **SEVERITYCODE**
    - Detailed description of the severity **SEVERITYDESC**

The data set attributes also include a unique number for each incident **OBJECTID**, a report number **REPORTNO**, the column 'SEVERITYCODE' copy **SEVERITYCODE.1**, **INCKEY** and **COLDETKEY**, whether or not 'INCKEY' matches 'COLDETKEY' **STATUS**, **EXCEPTRSNCODE** and **EXCEPTRSNDESC**. <br>

We are now ready to read the data.

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline

In [2]:
!wget -O Data-Collisions.csv https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv

--2020-09-14 10:27:13--  https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv
Resolving s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)... 67.228.254.196
Connecting to s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73917638 (70M) [text/csv]
Saving to: ‘Data-Collisions.csv’


2020-09-14 10:27:16 (36.0 MB/s) - ‘Data-Collisions.csv’ saved [73917638/73917638]



In [3]:
df = pd.read_csv("Data-Collisions.csv")
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


We use the following command to find out what type of data and how many non-null values we have.

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194673 entries, 0 to 194672
Data columns (total 38 columns):
SEVERITYCODE      194673 non-null int64
X                 189339 non-null float64
Y                 189339 non-null float64
OBJECTID          194673 non-null int64
INCKEY            194673 non-null int64
COLDETKEY         194673 non-null int64
REPORTNO          194673 non-null object
STATUS            194673 non-null object
ADDRTYPE          192747 non-null object
INTKEY            65070 non-null float64
LOCATION          191996 non-null object
EXCEPTRSNCODE     84811 non-null object
EXCEPTRSNDESC     5638 non-null object
SEVERITYCODE.1    194673 non-null int64
SEVERITYDESC      194673 non-null object
COLLISIONTYPE     189769 non-null object
PERSONCOUNT       194673 non-null int64
PEDCOUNT          194673 non-null int64
PEDCYLCOUNT       194673 non-null int64
VEHCOUNT          194673 non-null int64
INCDATE           194673 non-null object
INCDTTM           194673 non-null obje

To understand our data even better we check what values some attributes have.

In [5]:
def list_subtraction(list1, list2): 
    list3 = [value for value in list1 if value not in list2] 
    return list3

column_list = df.columns.values.tolist()
remove_list = ['X','Y','OBJECTID','INCKEY','COLDETKEY','REPORTNO','LOCATION', \
                   'INCDATE','INCDTTM','SDOT_COLCODE','SDOT_COLDESC','SDOTCOLNUM', \
                   'ST_COLCODE','ST_COLDESC','SEGLANEKEY','CROSSWALKKEY']
column_list = list_subtraction(column_list, remove_list)
for column_name in column_list:
    print(column_name,':   ', df[column_name].unique())        

SEVERITYCODE :    [2 1]
STATUS :    ['Matched' 'Unmatched']
ADDRTYPE :    ['Intersection' 'Block' 'Alley' nan]
INTKEY :    [37475.    nan 34387. ... 36056. 38057. 26005.]
EXCEPTRSNCODE :    [' ' nan 'NEI']
EXCEPTRSNDESC :    [nan 'Not Enough Information, or Insufficient Location Information']
SEVERITYCODE.1 :    [2 1]
SEVERITYDESC :    ['Injury Collision' 'Property Damage Only Collision']
COLLISIONTYPE :    ['Angles' 'Sideswipe' 'Parked Car' 'Other' 'Cycles' 'Rear Ended' 'Head On'
 nan 'Left Turn' 'Pedestrian' 'Right Turn']
PERSONCOUNT :    [ 2  4  3  0  1  5  6 16  8  7 11  9 12 17 26 22 10 37 13 36 28 14 53 19
 30 29 23 44 15 32 21 41 27 20 35 43 81 18 25 48 24 34 57 39 47 54 31]
PEDCOUNT :    [0 1 2 3 4 5 6]
PEDCYLCOUNT :    [0 1 2]
VEHCOUNT :    [ 2  3  1  0  4  7  5  6  8 11  9 10 12]
JUNCTIONTYPE :    ['At Intersection (intersection related)'
 'Mid-Block (not related to intersection)' 'Driveway Junction'
 'Mid-Block (but intersection related)'
 'At Intersection (but not related t

### Cleaning and Preparing the Data

To get a better understanding about timing of the incidents and to be able to use this later we will change **INCDTTM** and **INCDATE** columns into datetime format and extract information about the date that could be useful:**YEAR**, **MONTH**, **DAYOFWEEK** and **TIMESLOT**.

In [6]:
from datetime import datetime, timedelta

df['INCDTTM'] = pd.to_datetime(df['INCDTTM'])
df['INCDATE'] = pd.to_datetime(df['INCDATE'])

df['YEAR'] = df['INCDATE'].dt.year
df['MONTH'] = df['INCDATE'].dt.month
df['DAYOFWEEK'] = df['INCDATE'].dt.dayofweek

df['INCDTTM1'] = pd.DatetimeIndex(df['INCDTTM']) + timedelta(hours=1)

df['TIMESLOT'] = df['INCDTTM'].dt.strftime('%H').astype(str) + " - " + df['INCDTTM1'].dt.strftime('%H').astype(str)

df.drop(['INCDTTM1'], axis = 1, inplace = True)

df[['INCDATE','INCDTTM','YEAR','MONTH','DAYOFWEEK','TIMESLOT']].head()

Unnamed: 0,INCDATE,INCDTTM,YEAR,MONTH,DAYOFWEEK,TIMESLOT
0,2013-03-27 00:00:00+00:00,2013-03-27 14:54:00,2013,3,2,14 - 15
1,2006-12-20 00:00:00+00:00,2006-12-20 18:55:00,2006,12,2,18 - 19
2,2004-11-18 00:00:00+00:00,2004-11-18 10:20:00,2004,11,3,10 - 11
3,2013-03-29 00:00:00+00:00,2013-03-29 09:26:00,2013,3,4,09 - 10
4,2004-01-28 00:00:00+00:00,2004-01-28 08:04:00,2004,1,2,08 - 09


Having done that, we know that **INCDATE** provides less information than **INCDTTM** and so we can drop this column.

In [7]:
df.drop(['INCDATE'], axis = 1, inplace = True)

Since columns **EXCEPTRSNCODE**, **EXCEPTRSNDESC** and **INTKEY** and  have a lot of missing data and/or have data that does not provide any information we will also drop these columns. 

In [8]:
df.drop(['EXCEPTRSNCODE', 'EXCEPTRSNDESC','INTKEY'], axis = 1, inplace = True)

As columns **SEVERITYCODE** and **SEVERITYCODE.1** are equal, **SEVERITYCODE** takes only values 1 and 2 and **SEVERITYDESC** values are *'Injury Collision'* if **SEVERITYCODE** = 2 and *'Property Damage Only Collision'* if **SEVERITYCODE** = 1, we drop column **SEVERITYDESC** and change values of **SEVERITYCODE** and **SEVERITYCODE.1** to numeric values so that '1' corresponds to *'Minor Injury'* and '0' corresponds to *'Property Damage Only'*. 

In [9]:
df[['SEVERITYCODE','SEVERITYCODE.1']] = df[['SEVERITYCODE','SEVERITYCODE.1']].replace([2, 1],[1,0])
df.drop(['SEVERITYDESC'], axis = 1, inplace = True)

Let's also change values of columns **HITPARKEDCAR**, **SPEEDING**, **PEDROWNOTGRNT**, **UNDERINFL**, **INATTENTIONIND** from *0*, *N*, *NaN* to *0* and from *Y*, *1* to *1*.

In [10]:
df[['HITPARKEDCAR','SPEEDING','PEDROWNOTGRNT','UNDERINFL','INATTENTIONIND']] = \
    df[['HITPARKEDCAR','SPEEDING','PEDROWNOTGRNT','UNDERINFL','INATTENTIONIND']].replace([np.nan, 'N','0'], 0)

df[['HITPARKEDCAR','SPEEDING','PEDROWNOTGRNT','UNDERINFL','INATTENTIONIND']] = \
    df[['HITPARKEDCAR','SPEEDING','PEDROWNOTGRNT','UNDERINFL','INATTENTIONIND']].replace(['Y','1'], 1)

Column **STATUS** represents whether **INCKEY** and **COLDETKEY** are equal. **INCKEY** and **COLDETKEY**  are certain keys, that is, some identification numbers which do not provide us any relevant information, hence we can drop these columns.

In [11]:
df.drop(['STATUS','INCKEY','COLDETKEY'], axis = 1, inplace = True)

Similary **OBJECTID**, **REPORTNO**, **ST_COLCODE**, **ST_COLDESC**, **SDOT_COLCODE**, **SDOT_COLDESC** and **SDOTCOLNUM** are other keys and identification numbers (or their descriptions) which do not seem to provide us any information regarding factors of the incidents, hence we also drop these attributes.

In [12]:
df.drop(['OBJECTID', 'REPORTNO', 'ST_COLCODE', 'ST_COLDESC', 'SDOT_COLCODE', 'SDOT_COLDESC', 'SDOTCOLNUM'], axis = 1, inplace = True)

We shall arrange the weather, road and lighting conditions (**WEATHER**, **ROADCOND** and **LIGHTCOND**) into fewer categories so that it is easier to plot and model later.

In [13]:
df['WEATHER'] = df['WEATHER'].replace(['Raining','Snowing','Sleet/Hail/Freezing Rain'],'Precipitation')
df['WEATHER'] = df['WEATHER'].replace(['Overcast','Partly Cloudy'],'Cloudy')
df['WEATHER'] = df['WEATHER'].replace(['Unknown','Other', np.nan],'Unknown')

df['ROADCOND'] = df['ROADCOND'].replace(['Unknown','Other', np.nan],'Unknown')
df['ROADCOND'] = df['ROADCOND'].replace(['Standing Water','Wet'],'Wet')

df['LIGHTCOND'] = df['LIGHTCOND'].replace(['Unknown','Other', np.nan],'Unknown')
df['LIGHTCOND'] = df['LIGHTCOND'].replace(['Dark - No Street Lights','Dark - Street Lights Off'],'Dark - Street Lights Off')
df['LIGHTCOND'] = df['LIGHTCOND'].replace(['Dark - Unknown Lighting','Dark - Street Lights On',],'Dark - Street Lights On')

Notice that **X**, **Y** and **LOCATION** attributes provide good location information. One could use these attributes to identify locations where most incidents occure. However, they do not provide any information about the characteristics of the location, hence, using these might result in overfitting and do not provide insightful results. So we also drop these columns.

In [14]:
df.drop(['X', 'Y', 'LOCATION'], axis = 1, inplace = True)

### Remarks

Since we aim to predict **SEVERITYCODE** which takes values (0) and (1) for 'Property Damage Only' and 'Minor Injury' respectively i.e. **SEVERITYCODE** takes binary values, we will probably implement Gradient Boosting Classifier, K-Nearest Neighbours Algorithm and Support Vector Machines. <br>

For Explanatory Data Analysis we intend to keep features:
- **SEVERITYCODE** (aim of the case study)
- **ADDRTYPE**, **JUNCTIONTYPE**, **CROSSWALKEY**, **SEGLANEKEY** (location information)
- **INCDTTM**, **YEAR**, **MONTH**, **DAYOFWEEK**, **TIMESLOT** (date and time information)
- **PERSONCOUNT**, **PEDCOUNT**, **PEDCYLCOUNT**, **VEHCOUNT**
- **UNDERINFL**, **PEDROWNOTGRNT**, **SPEEDING**, **INATTENTIONIND**, **HITPARKEDCAR**
- **WEATHER**, **ROADCOND**, **LIGHTCOND** <br>

In [15]:
df.head()

Unnamed: 0,SEVERITYCODE,ADDRTYPE,SEVERITYCODE.1,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INCDTTM,JUNCTIONTYPE,...,LIGHTCOND,PEDROWNOTGRNT,SPEEDING,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR,YEAR,MONTH,DAYOFWEEK,TIMESLOT
0,1,Intersection,1,Angles,2,0,0,2,2013-03-27 14:54:00,At Intersection (intersection related),...,Daylight,0,0,0,0,0,2013,3,2,14 - 15
1,0,Block,0,Sideswipe,2,0,0,2,2006-12-20 18:55:00,Mid-Block (not related to intersection),...,Dark - Street Lights On,0,0,0,0,0,2006,12,2,18 - 19
2,0,Block,0,Parked Car,4,0,0,3,2004-11-18 10:20:00,Mid-Block (not related to intersection),...,Daylight,0,0,0,0,0,2004,11,3,10 - 11
3,0,Block,0,Other,3,0,0,3,2013-03-29 09:26:00,Mid-Block (not related to intersection),...,Daylight,0,0,0,0,0,2013,3,4,09 - 10
4,1,Intersection,1,Angles,2,0,0,2,2004-01-28 08:04:00,At Intersection (intersection related),...,Daylight,0,0,0,0,0,2004,1,2,08 - 09


### Search for Trends, Patterns & Correlations

Now we will create some graphs to find out if date and time are relevant in predicting the severity of a possible incident.

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') 

First we will prepare the data for our plots.

In [None]:
df_low = df[df.SEVERITYCODE == 0]
df_high = df[df.SEVERITYCODE == 1]

In [None]:
# YEAR
df_year_low = df_low['YEAR'].value_counts()
df_year_high = df_high['YEAR'].value_counts()

df_year_low = df_year_low.rename('Property Damage Only')
df_year_high = df_year_high.rename('Minor Injury')
df_year = pd.concat([df_year_low, df_year_high], axis=1, sort=False)
df_year.sort_index(inplace = True)

# MONTH
df_month_low = df_low['MONTH'].value_counts()
df_month_high = df_high['MONTH'].value_counts()

df_month_low = df_month_low.rename('Property Damage Only')
df_month_high = df_month_high.rename('Minor Injury')
df_month = pd.concat([df_month_low, df_month_high], axis=1, sort=False)
df_month.sort_index(inplace = True)
df_month = df_month.rename(index={1:'January',2:'February',3:'March',4:'April',5:'May',6:'June', \
                                  7:'July',8:'August',9:'September',10:'October',11:'November',12:'December'})

# DAY OF WEEK
df_dayofweek_low = df_low['DAYOFWEEK'].value_counts()
df_dayofweek_high = df_high['DAYOFWEEK'].value_counts()

df_dayofweek_low = df_dayofweek_low.rename('Property Damage Only')
df_dayofweek_high = df_dayofweek_high.rename('Minor Injury')
df_dayofweek = pd.concat([df_dayofweek_low, df_dayofweek_high], axis=1, sort=False)
df_dayofweek.sort_index(inplace = True)
df_dayofweek = df_dayofweek.rename(index={0:'Monday', 1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Saturday',6:'Sunday'})

# TIME SLOT
df_timeslot_low = df_low['TIMESLOT'].value_counts()
df_timeslot_high = df_high['TIMESLOT'].value_counts()

df_timeslot_low = df_timeslot_low.rename('Property Damage Only')
df_timeslot_high = df_timeslot_high.rename('Minor Injury')
df_timeslot = pd.concat([df_timeslot_low, df_timeslot_high], axis=1, sort=False)
df_timeslot.sort_index(inplace = True)

Now we can plot the data to understand better what is going on.

In [None]:
fig = plt.figure()

fig.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.9, wspace=0.3, hspace=0.5)

ax0 = fig.add_subplot(2, 2, 1) 
ax1 = fig.add_subplot(2, 2, 2) 
ax2 = fig.add_subplot(2, 2, 3) 
ax3 = fig.add_subplot(2, 2, 4) 

# Subplot 1:  Line plot YEAR
df_year.plot(kind='line',rot=0,figsize=(17, 12),color=['#5cb85c','#5bc0de'], ax=ax0)

ax0.spines['bottom'].set_linestyle('-')
ax0.spines['bottom'].set_color('black')
ax0.spines['left'].set_linestyle('-')
ax0.spines['left'].set_color('black')
ax0.set_facecolor("white")
ax0.tick_params(labelsize=14)
ax0.set_title("Severity of Incidents Each Year", fontsize=16)
ax0.set_xlabel('Year')
ax0.set_ylabel('Number of Incidents')
ax0.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

# Subplot 2: Bar plot MONTH
df_month.plot(kind='bar', rot=45, figsize=(27, 12), width=0.8, color=['#5cb85c','#5bc0de'], ax=ax1)

ax1.spines['bottom'].set_linestyle('-')
ax1.spines['bottom'].set_color('black')
ax1.spines['left'].set_linestyle('-')
ax1.spines['left'].set_color('black')
ax1.set_facecolor("white")
ax1.tick_params(labelsize=14)
ax1.set_title("Severity of Incidents Each Month", fontsize=16)
ax1.set_xlabel('Month')
ax1.set_ylabel('Number of Incidents')
ax1.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

# Subplot 3: Bar plot DAYOFWEEK
df_dayofweek.plot(kind='bar', rot=0, figsize=(17, 12), width=0.8, color=['#5cb85c','#5bc0de'], ax=ax2)

ax2.spines['bottom'].set_linestyle('-')
ax2.spines['bottom'].set_color('black')
ax2.spines['left'].set_linestyle('-')
ax2.spines['left'].set_color('black')
ax2.set_facecolor("white")
ax2.tick_params(labelsize=14)
ax2.set_title("Severity of Incidents Each Day of Week", fontsize=16)
ax2.set_xlabel('Day of Week')
ax2.set_ylabel('Number of Incidents')
ax2.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

# Subplot 4:  Line plot TIMESLOT
df_timeslot.plot(kind='line',rot=0,figsize=(27, 12),color=['#5cb85c','#5bc0de'], ax=ax3)

ax3.spines['bottom'].set_linestyle('-')
ax3.spines['bottom'].set_color('black')
ax3.spines['left'].set_linestyle('-')
ax3.spines['left'].set_color('black')
ax3.set_facecolor("white")
ax3.tick_params(labelsize=14)
ax3.set_title("Severity of Incidents Time of Day", fontsize=16)
ax3.set_xlabel('Hours')
ax3.set_ylabel('Number of Incidents')
ax3.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

plt.show()

In [None]:
df_weather_low = df_low['WEATHER'].value_counts()
df_weather_high = df_high['WEATHER'].value_counts()

df_weather_low = df_weather_low.rename('Property Damage Only')
df_weather_high = df_weather_high.rename('Minor Injury')
df_weather = pd.concat([df_weather_low, df_weather_high], axis=1, sort=False)

df_weather_p = df_weather.copy()

for index in range(0, df_weather_p.shape[0]):
    df_weather_p.iloc[index,:] = df_weather_p.iloc[index,:] / (df_weather_p.iloc[index,0]+df_weather_p.iloc[index,1])
    df_weather_p.iloc[index,:] = round(df_weather_p.iloc[index,:], 2)

    
df_road_low = df_low['ROADCOND'].value_counts()
df_road_high = df_high['ROADCOND'].value_counts()

df_road_low = df_road_low.rename('Property Damage Only')
df_road_high = df_road_high.rename('Minor Injury')
df_road = pd.concat([df_road_low, df_road_high], axis=1, sort=False)

df_road_p = df_road.copy()

for index in range(0, df_road_p.shape[0]):
    df_road_p.iloc[index,:] = df_road_p.iloc[index,:] / (df_road_p.iloc[index,0]+df_road_p.iloc[index,1])
    df_road_p.iloc[index,:] = round(df_road_p.iloc[index,:], 2)    
    

df_light_low = df_low['LIGHTCOND'].value_counts()
df_light_high = df_high['LIGHTCOND'].value_counts()

df_light_low = df_light_low.rename('Property Damage Only')
df_light_high = df_light_high.rename('Minor Injury')
df_light = pd.concat([df_light_low, df_light_high], axis=1, sort=False)

df_light_p = df_light.copy()

for index in range(0, df_light_p.shape[0]):
    df_light_p.iloc[index,:] = df_light_p.iloc[index,:] / (df_light_p.iloc[index,0]+df_light_p.iloc[index,1])
    df_light_p.iloc[index,:] = round(df_light_p.iloc[index,:], 2)      

In [None]:
fig = plt.figure()

fig.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.9, wspace=1.0, hspace=1.0)

ax0 = fig.add_subplot(3, 1, 1) 
ax1 = fig.add_subplot(3, 1, 2) 
ax2 = fig.add_subplot(3, 1, 3)  

# Subplot 1:  Bar Plot WEATHER PERCENTAGE
df_weather_p.plot(kind='bar',rot=45,width=0.8,figsize=(15, 12),color=['#5cb85c','#5bc0de'], ax=ax0)

ax0.spines['bottom'].set_linestyle('-')
ax0.spines['bottom'].set_color('black')
ax0.spines['left'].set_linestyle('-')
ax0.spines['left'].set_color('black')
ax0.set_facecolor("white")
ax0.tick_params(labelsize=14)
ax0.set_title("Severity of Incidents under Different Weather Conditions", fontsize=16)
ax0.set_xlabel('Weather')
ax0.set_ylabel('Percentage of incidents')
ax0.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

# Subplot 2: Bar Plot ROAD CONDITIONS PERCENTAGE
df_road_p.plot(kind='bar', rot=45, figsize=(15, 12), width=0.8, color=['#5cb85c','#5bc0de'], ax=ax1)

ax1.spines['bottom'].set_linestyle('-')
ax1.spines['bottom'].set_color('black')
ax1.spines['left'].set_linestyle('-')
ax1.spines['left'].set_color('black')
ax1.set_facecolor("white")
ax1.tick_params(labelsize=14)
ax1.set_title("Severity of Incidents under Different Road Conditions", fontsize=16)
ax1.set_xlabel('Road Conditions')
ax1.set_ylabel('Percentage of incidents')
ax1.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

# Subplot 3: Bar plot LIGHT CONDITIONS PERCENTAGE
df_light_p.plot(kind='bar', rot=45, figsize=(15, 12), width=0.8, color=['#5cb85c','#5bc0de'], ax=ax2)

ax2.spines['bottom'].set_linestyle('-')
ax2.spines['bottom'].set_color('black')
ax2.spines['left'].set_linestyle('-')
ax2.spines['left'].set_color('black')
ax2.set_facecolor("white")
ax2.tick_params(labelsize=14)
ax2.set_title("Severity of Incidents under Different Light Conditions", fontsize=16)
ax2.set_xlabel('Light Conditions')
ax2.set_ylabel('Percentage of incidents')
ax2.legend(['Property Damage Only','Minor Injury'], loc='upper right', fontsize=14, facecolor='white')

plt.show()