## Toronto Accident Analysis 2006 - 2019

- The project gives detailed insights into **Toronto long-term serious road accident trends between 2006 - 2019,** which includes but not limited to potential casualties due to road accidents, areas most affected by accidents, aggressive driving, road types, and geographical regions well known for accidents, and environmental condition. It also gives insight about unexpected jumps such as an outstanding peak in fatal traffic related collisions involving cyclists between 2012 and 2013.


- **This project is not finished and we will work on it on a regular basis**

 #### You can find more information about data set here : [click me](https://open.canada.ca/data/en/dataset/1eb9eba7-71d1-4b30-9fb1-30cbdab7e63a)

### Motivation

The increase in the traffic accidents, injuries and fatalities since 2006 was the motivation to use this datasets. This is can be the very first step towards making toronto streets safer.


Toronto ranked one of the most visited cities to visit in North America link with a huge population of more than 3.0 Million Link. Curing the risk of accident is highly required to make it safe for Torontonians and keep up Toronto's reputation as a desired tourist destination

### Questions:

In this project 10 problem statements were developed. Addressing those problems along to some analysis with the help of data visualization remain the goal of the project.


1. 

   a.  What is the rate of road accidents (i.e. the number of casualties) in Toronto between 2006 - 2019?
   
   b.  What is the rate of the road accidents based on different Toronto Regions?



2. Which ***highway authorities are the most dangerous or safest*** in Toronto based on accident records between 2006 - 2019?  ***We will look at only the top 20 of them.***



3. What are the accident occurence rate in Toronto ***based on time of the day, weekdays, and months of the year,*** between 2006 - 2019?



4. 

   a. Which particular road network group (based on network density range) is the most dangerous (i.e. has high numbers of  casualties) between 2006 - 2019?
   
   b. Which road type has the highest rate of road accidents between 2006 - 2019?


5. What condition **(taking all conditions into account i.e. road, weather and light conditions)** caused the most road accidents in Toronto between 2006 - 2019?



6. Is pedestrian crossing a cause of road accident, or does it influence the road casualties in Toronto?



7. 

     a. How many numbers of casualties occur per accidents in Toronto and what is their distribution in terms of the total amount of road accident that occured between 2006 - 2019?

     b. How many vehicles are invloved in each road accident, and what is their distribution in terms of the total amount of road accident that occured between 2006 - 2019?




8. Which speed limit is closely associated with road accidents in UK, from 2006 - 2019?



9. 

    a. Which areas (urban / rural) in Toronto were road accidents the most frequent in, from 2006 - 2019?

    b. How is the distribution of accident severity in Toronto, from 2006 - 2019?

In [891]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

In [892]:
df= pd.read_csv('KSI.csv', parse_dates= ['DATE'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16093 entries, 0 to 16092
Data columns (total 56 columns):
 #   Column         Non-Null Count  Dtype              
---  ------         --------------  -----              
 0   X              16093 non-null  float64            
 1   Y              16093 non-null  float64            
 2   Index_         16093 non-null  int64              
 3   ACCNUM         16093 non-null  int64              
 4   YEAR           16093 non-null  int64              
 5   DATE           16093 non-null  datetime64[ns, UTC]
 6   TIME           16093 non-null  int64              
 7   HOUR           16093 non-null  int64              
 8   STREET1        16093 non-null  object             
 9   STREET2        14698 non-null  object             
 10  OFFSET         2388 non-null   object             
 11  ROAD_CLASS     15725 non-null  object             
 12  District       16080 non-null  object             
 13  WardNum        13795 non-null  float64        

In [893]:
df['year'] = df['DATE'].dt.year
df['month'] = df['DATE'].dt.month
df['day'] = df['DATE'].dt.day
df_clean = df.replace(' ', np.nan, regex=False)
#df_clean['WEEKDAY']= df_clean['WEEKDAY'].astype('str')

In [894]:
print(df_clean.isna().sum()/len(df_clean)*100)

X                 0.000000
Y                 0.000000
Index_            0.000000
ACCNUM            0.000000
YEAR              0.000000
DATE              0.000000
TIME              0.000000
HOUR              0.000000
STREET1           0.000000
STREET2           8.668365
OFFSET           85.161250
ROAD_CLASS        2.286709
District          0.080780
WardNum          14.279500
Division          0.000000
LATITUDE          0.000000
LONGITUDE         0.000000
LOCCOORD          0.807805
ACCLOC           33.865656
TRAFFCTL          0.180203
VISIBILITY        0.136705
LIGHT             0.000000
RDSFCOND          0.167775
ACCLASS           0.000000
IMPACTYPE         0.024856
INVTYPE           0.062139
INVAGE            0.000000
INJURY            9.998136
FATAL_NO         95.818058
INITDIR          30.050332
VEHTYPE          15.702479
MANOEUVER        43.198906
DRIVACT          50.090101
DRIVCOND         50.090101
PEDTYPE          84.030324
PEDACT           84.067607
PEDCOND          83.321941
C

In [895]:
# Dropping the columns with more than 80% Nan values
df_clean = df_clean.drop(['OFFSET', 'PEDTYPE', 'PEDACT', 'PEDCOND', 'CYCLISTYPE', 'CYCACT', 'CYCCOND'], axis=1)

In [896]:
age_group = [age_group for age_group,df in df_clean.groupby('INVAGE')]
del age_group[0]
del age_group[18]
year = [year for year,df in df_clean.groupby('year')]
month = [month for month,df in df_clean.groupby('month')]
Major_Fatal = df_clean[ (df_clean['INJURY'] == 'Major') | (df_clean['INJURY'] == 'Fatal')]

### 1a. What is the rate of road accidents (i.e. the number of casualties) in Toronto between 2006 - 2019?
### 1b. What is the rate of the road accidents based on different Toronto Regions?

In [922]:
ACCNUM = df_clean.groupby('year')['Index_'].nunique()
#ACCNUM = df_clean['ACCNUM'].value_counts()
fig = px.line(df_clean, x = year, y = ACCNUM, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Records of Serious Road Accidents (Casualties per year) in Toronto between 2006 - 2019",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

**Observations**

- It can be observed that the overall rates of road accidents, in spite of some fluctuation, are declining. While, in 2012 and 2018 there were a spike, following latter road accidents drops significantly.

In [923]:
district = [district for district,df in df_clean.groupby('District')]
fig = px.bar(df_clean, x = district, y = df_clean['District'].value_counts(), labels={'x':'','y':''})
colors = ['green'] * 4
colors[0] = 'red'
colors[1] = 'blue'
colors[3] = 'gray'
fig.update_traces(marker_color=colors)
fig.update_layout(
    title={
        'text': "Rates of Serious Road Accidents in different Toronto district",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [924]:
ACCNUM = df_clean.groupby('month')['ACCNUM'].nunique()

fig = px.bar(df_clean, x = month, y = ACCNUM, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Records of Serious Road Accidents (Casualties per month) in Toronto",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [929]:
fig = go.Figure(data=[
    go.Bar(name='Non-Fatal', x= year, y= df_clean[df_clean['ACCLASS']== 'Non-Fatal Injury'].groupby('year')['Index_'].nunique()),
    go.Bar(name='Fatal', x=year , y= df_clean[df_clean['ACCLASS']== 'Fatal'].groupby('year')['Index_'].nunique())])
# Change the bar mode
fig.update_layout(barmode='group',
    title={
        'text': "Comparing Fatal and Non-Fatal Accidents (Casualties per year) in Toronto",
        'y':0.89,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [930]:
fig = px.bar(df_clean, x = year, y = df_clean[df_clean['INJURY']== 'Fatal'].groupby('year')['Index_'].nunique(), labels={'x':'','y':''})
fig.update_traces(marker_color='red')
fig.update_layout(
    title={
        'text': "Records of Fatal Road Accidents (Fatal Injuries per year) in Toronto",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [931]:
fig = px.bar(df_clean, x = df_clean['Neighbourhood'].value_counts().index , y = df_clean['Neighbourhood'].value_counts(), labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Rates of Road Accidents in different Toronto neighbourhoods",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [903]:
ACCNUM= df_clean.groupby('VISIBILITY').nunique().sort_values(by='Index_',ascending = False)['Index_']
fig = px.bar(df_clean, x = ACCNUM.index, y = ACCNUM, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Environment Condition at the time of road accidents in Toronto between 2006 - 2019",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_xaxes(tickangle = -45)
fig.show()

In [935]:
Major_Fatal_CYCLIST = Major_Fatal[Major_Fatal['CYCLIST'] == 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_CYCLIST, labels = {'x':'','y':''})
fig.update_layout(
    title = {
        'text': "Records of Serious Road Accidents (Fatal Injuries per year) Where a Cyclist Was Involved",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(range = [31, 80])
fig.show()

**Observation**

The possible reason for jump in 2012 could be bicycle lanes removal after the election of new mayor Rob Ford. Link

In [936]:
Major_Fatal_CYCLIST= Major_Fatal[Major_Fatal['CYCLIST']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_CYCLIST, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Records of Serious Road Accidents (Fatal Injuries per month) Where a Cyclist Was Involved",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [934]:
Major_Fatal_CYCLIST= Major_Fatal[Major_Fatal['CYCLIST']== 'Yes'].groupby('INVAGE')['Index_'].nunique()
fig = px.bar(df_clean, x = age_group, y = Major_Fatal_CYCLIST, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Records of Serious Road Accidents by different age group Where a cyclist was involved",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [907]:
Major_Fatal_AG= Major_Fatal[Major_Fatal['AG_DRIV']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_AG, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal injuries where aggresived driving played a role by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [908]:
Major_Fatal_AG= Major_Fatal[Major_Fatal['AG_DRIV']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_AG, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal injuries where aggresived driving played a role by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [909]:
#Major_Fatal_AG= Major_Fatal[Major_Fatal['AG_DRIV']== 'Yes'].groupby('INVAGE')['Index_'].nunique()
#fig = px.bar(df_clean, x = age_group, y = Major_Fatal_AG, labels={'x':'','y':''})
#fig.update_layout(
   # title={
    #    'text': "Number of serious or fatal injuries where a cyclist was involved by age drive",
     #   'y':0.95,
      #  'x':0.5,
       # 'xanchor': 'center',
        #'yanchor': 'top'})
#fig.show()

In [910]:
Major_Fatal_ALC= Major_Fatal[Major_Fatal['ALCOHOL']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_ALC, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where ALCOHOL consupmtion played a role by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [911]:
Major_Fatal_ALC= Major_Fatal[Major_Fatal['ALCOHOL']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_ALC, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where ALCOHOL consupmtion played a role by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [912]:
Major_Fatal_AUT= Major_Fatal[Major_Fatal['AUTOMOBILE']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_AUT, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a driver of an automobile was involved by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(range = [301, 600])
fig.show()

In [913]:
Major_Fatal_AUT= Major_Fatal[Major_Fatal['AUTOMOBILE']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_AUT, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a driver of an automobile was involved by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [914]:
Major_Fatal_MOT= Major_Fatal[Major_Fatal['MOTORCYCLE']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_MOT, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a MOTORCYCLE was involved by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [915]:
Major_Fatal_MOT= Major_Fatal[Major_Fatal['MOTORCYCLE']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_MOT, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a MOTORCYCLE was involved by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [916]:
Major_Fatal_SPD= Major_Fatal[Major_Fatal['SPEEDING']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_SPD, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where speeding played a role by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [917]:
Major_Fatal_SPD= Major_Fatal[Major_Fatal['SPEEDING']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_SPD, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where speeding played a role by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [918]:
Major_Fatal_RED= Major_Fatal[Major_Fatal['REDLIGHT']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_RED, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision when red light running played a role by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(range = [12, 60])
fig.show()

In [919]:
Major_Fatal_RED= Major_Fatal[Major_Fatal['REDLIGHT']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_RED, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision when red light running played a role by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [920]:
Major_Fatal_PED= Major_Fatal[Major_Fatal['PEDESTRIAN']== 'Yes'].groupby('year')['Index_'].nunique()
fig = px.line(df_clean, x = year, y = Major_Fatal_PED, labels={'x':'Year','y':'Number of Injuries=Fatal'})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a pedestrain was involved by year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [921]:
Major_Fatal_PED= Major_Fatal[Major_Fatal['PEDESTRIAN']== 'Yes'].groupby('month')['Index_'].nunique()
fig = px.bar(df_clean, x = month, y = Major_Fatal_PED, labels={'x':'','y':''})
fig.update_layout(
    title={
        'text': "Number of serious or fatal collision where a pedestrain was involved by month",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()