# Exploring trends in the cleaned data
This notebook will delve into some trends of the crash data, which are included in the article that published alongside the crash dashboard.

Before running this notebook, you should run the cleaning scripts in the `cleaning-workflow/cleaning-scripts` folder on Github using the instructions in the `readme` document. This notebook will analyze the `master-crash.csv` file which is generated at the end of the cleaning process.

In [1]:
# basic data/viz libraries
import pandas as pd
from matplotlib import pyplot as plt
from datetime import date, time

import altair as alt
# saving data into a file rather than embedding into the chart
alt.data_transformers.enable('json')
alt.renderers.enable('default')

RendererRegistry.enable('default')

In [64]:
# read in the cleaned master csv
crash_df = pd.read_csv('../data/clean-data/master-crashes.csv', low_memory=False)  

## Trends over time

First, let's see some time trends. How has the frequency of crashes and crash injuries/fatalities changed over the time span of the dataset? 

In [31]:
def get_year(date):
    return pd.to_datetime(date).year

In [32]:
def get_month(date):
    return pd.to_datetime(date).month

In [67]:
crash_df['Year'] = crash_df['DateTime'].apply(get_year)
crash_df['Month'] = crash_df['DateTime'].apply(get_month)

In [65]:
# set up function to create line chart with altair
def make_line_chart(source, title, values, x_axis, y_axis):
    return alt.Chart(source, 
          width=500,
          height=300, 
          title=title,
    ).mark_line().encode(
        alt.X(x_axis, axis=alt.Axis(
                values=values,
                grid=True,
                labelAngle=0)
             ),
        alt.Y(y_axis),
    )

In [94]:
annual_crash_counts = crash_df.groupby(by="Year").size().reset_index().rename(columns={0:"Crash count"})
# annual_crash_counts

In [104]:
source = annual_crash_counts
title='Car crashes year by year in Monroe County, IN'
values=annual_crash_counts['Year'].unique()
x_axis='Year:N'
y_axis='Crash count:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [135]:
monthly_crash_counts = crash_df.groupby(by="Month").size().reset_index().rename(columns={0:"Crash count"})
monthly_crash_counts['Crash count'] = monthly_crash_counts['Crash count'].apply(lambda x: x / len(monthly_crash_counts['Month'].unique()))
# monthly_crash_counts

In [134]:
# total crashes per month
source = monthly_crash_counts
title='Average number of car crashes per month (data from 2003-2022)'
values=monthly_crash_counts['Month'].unique()
# values=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
x_axis='Month:Q'
y_axis='Crash count:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [150]:
annual_fatalities = crash_df[crash_df['Number Dead'] > 0].groupby(by="Year").size().reset_index().rename(columns={0:"Fatality count"})
# annual_fatalities


In [10]:
# fatalities per year
# 2003-2012 ARE LOW ESTIMATES
counts = []
years = crash_df['Year'].unique()
for year in years:
    year_df = crash_df[crash_df['Year']==year]
    counts.append(year_df[year_df['Number Dead'] > 0]['Year'].count())
    
years, counts = zip(*sorted(zip(years, counts)))
source = pd.DataFrame({
  'Year': years,
  'Number of Crash Deaths': counts
})
title='Crash deaths by year in Monroe County, IN (data from 2003-22)'
values=years
x_axis='Year:O'
y_axis='Number of Crash Deaths:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [143]:
# fatalities per year as a proportion of total crashes
# 2003-2012 ARE LOW ESTIMATES
counts = []
years = crash_df['Year'].unique()
for year in years:
    year_df = crash_df[crash_df['Year']==year]
    counts.append(year_df[year_df['Number Dead'] > 0]['Year'].count() / year_df.shape[0] * 1000)
    
years, counts = zip(*sorted(zip(years, counts)))
source = pd.DataFrame({
  'Year': years,
  'Deaths per every 1,000 crashes': counts
})
title='Deaths per every 1,000 crashes'
values=years
x_axis='Year:O'
y_axis='Deaths per every 1,000 crashes:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [144]:
# injuries per year
# 2003-2012 ARE LOW ESTIMATES
counts = []
years = crash_df['Year'].unique()
for year in years:
    year_df = crash_df[crash_df['Year']==year]
    counts.append(year_df[year_df['Number Injured'] > 0]['Year'].count())
    
years, counts = zip(*sorted(zip(years, counts)))
source = pd.DataFrame({
  'Year': years,
  'Number of Crash Injuries': counts
})
title='Crash injuries by year in Monroe County, IN (data from 2003-22)'
values=years
x_axis='Year:O'
y_axis='Number of Crash Injuries:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [145]:
# injuries per year as a proportion of total crashes
# 2003-2012 ARE LOW ESTIMATES
counts = []
years = crash_df['Year'].unique()
for year in years:
    year_df = crash_df[crash_df['Year']==year]
    counts.append(year_df[year_df['Number Injured'] > 0]['Year'].count() / year_df.shape[0] * 1000)
    
years, counts = zip(*sorted(zip(years, counts)))
source = pd.DataFrame({
  'Year': years,
  'Injuries per every 1,000 crashes': counts
})
title='Injuries per every 1,000 crashes'
values=years
x_axis='Year:O'
y_axis='Injuries per every 1,000 crashes:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [64]:
# proportion of crashes that resulted in injury each month
months = crash_df['Month'].unique()
counts = []
for month in months:
    month_df = crash_df[crash_df['Month']==month]
    counts.append(year_df[year_df['Number Injured'] > 0]['Month'].count() / month_df.shape[0])
months, counts = zip(*sorted(zip(months, counts)))
month_names = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
months = month_names

source = pd.DataFrame({
  'Month': months,
  'Proportion of Injuries': counts
})
title='Proportion of Crashes that Involved Injury by Month'
values=month_names
x_axis='Month:O'
y_axis='Proportion of Injuries:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [65]:
# proportion of crashes that resulted in death each month
months = crash_df['Month'].unique()
counts = []
for month in months:
    month_df = crash_df[crash_df['Month']==month]
    counts.append(year_df[year_df['Number Dead'] > 0]['Month'].count() / month_df.shape[0])
months, counts = zip(*sorted(zip(months, counts)))
month_names = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
months = month_names

source = pd.DataFrame({
  'Month': months,
  'Proportion of Fatalities': counts
})
title='Proportion of Crashes that Involved Fatalities by Month'
values=month_names
x_axis='Month:O'
y_axis='Proportion of Fatalities:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [160]:
# ped_crashes_annual = 
annual_fatalities['Ped crashes'] = crash_df[crash_df['Pedestrian Involved'] == True].groupby(by="Year").size().reset_index()[0]
annual_fatalities['Bike crashes'] = crash_df[crash_df['Cyclist Involved'] == True].groupby(by="Year").size().reset_index()[0]
# annual_fatalities['Ped deaths'] = 
crash_df[(crash_df['Pedestrian Involved'] == True) & (crash_df['Number Dead'] > 0)].groupby(by="Year").size().reset_index()
# annual_fatalities


Unnamed: 0,Year,0
0,2008,3
1,2009,1
2,2010,1
3,2012,1
4,2014,2
5,2016,1
6,2017,1
7,2018,3
8,2019,2
9,2020,3


In [170]:
crash_df[['Year','Number Injured','Number Dead','Pedestrian Involved']].groupby(by="Year").count()

Unnamed: 0_level_0,Number Injured,Number Dead,Pedestrian Involved
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2003,4625,4625,7
2004,4269,4269,7
2005,3967,3967,7
2006,3937,3937,36
2007,4055,4055,52
2008,4341,4341,55
2009,4014,4014,51
2010,4060,4060,64
2011,3914,3914,77
2012,4223,4223,62


In [175]:
# what is the proportion of fatal crashes when pedestrians or cyclists
# are involved, compared with the overall fatality rate? 
# exclude years 2003-2005, which are outliers
bikeped_crashes = crash_df[crash_df['Year'] > 2005]

ped_involved = bikeped_crashes['Pedestrian Involved'] == True
bike_involved = bikeped_crashes['Cyclist Involved'] == True
bikeped_crashes[ped_involved | bike_involved]

fatality_rate_overall = bikeped_crashes[bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes.shape[0]
fatality_rate_ped = bikeped_crashes[ped_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[ped_involved].shape[0]
fatality_rate_bike = bikeped_crashes[bike_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[bike_involved].shape[0]
fatality_rate_bike_ped = bikeped_crashes[ped_involved | bike_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[ped_involved | bike_involved].shape[0]

print('Overall fatality rate:', fatality_rate_overall * 100, "%")
print('Pedestrian-involved crash fatality rate:', fatality_rate_ped * 100, "%")
print('Cyclist-involved crash fatality rate:', fatality_rate_bike * 100, "%")
print('Ped or cyclist involved crash fatality rate:', fatality_rate_bike_ped * 100, "%")


Overall fatality rate: 0.2359200088667923 %
Pedestrian-involved crash fatality rate: 2.1621621621621623 %
Cyclist-involved crash fatality rate: 0.1402524544179523 %
Ped or cyclist involved crash fatality rate: 1.2239902080783354 %


  fatality_rate_ped = bikeped_crashes[ped_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[ped_involved].shape[0]
  fatality_rate_bike = bikeped_crashes[bike_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[bike_involved].shape[0]
  fatality_rate_bike_ped = bikeped_crashes[ped_involved | bike_involved][bikeped_crashes['Number Dead'] > 0].shape[0] / bikeped_crashes[ped_involved | bike_involved].shape[0]


In [176]:
# what about injury rates? 

# what is the proportion of fatal crashes when pedestrians or cyclists
# are involved, compared with the overall fatality rate? 
injury_rate_overall = bikeped_crashes[bikeped_crashes['Number Injured'] > 0].shape[0] / bikeped_crashes.shape[0]
injury_rate_ped = bikeped_crashes[ped_involved][crash_df['Number Injured'] > 0].shape[0] / bikeped_crashes[ped_involved].shape[0]
injury_rate_bike = bikeped_crashes[bike_involved][crash_df['Number Injured'] > 0].shape[0] / bikeped_crashes[bike_involved].shape[0]
injury_rate_bike_ped = bikeped_crashes[ped_involved | bike_involved][bikeped_crashes['Number Injured'] > 0].shape[0] / bikeped_crashes[ped_involved | bike_involved].shape[0]

injury_rate_overall * 100, injury_rate_ped * 100, injury_rate_bike * 100, injury_rate_bike_ped * 100

print('Overall injury rate:', injury_rate_overall * 100, "%")
print('Pedestrian-involved crash injury rate:', injury_rate_ped * 100, "%")
print('Cyclist-involved crash injury rate:', injury_rate_bike * 100, "%")
print('Ped or cyclist involved crash injury rate:', injury_rate_bike_ped * 100, "%")


Overall injury rate: 21.299301740107985 %
Pedestrian-involved crash injury rate: 87.67567567567568 %
Cyclist-involved crash injury rate: 81.62692847124825 %
Ped or cyclist involved crash injury rate: 85.06731946144431 %


  injury_rate_ped = bikeped_crashes[ped_involved][crash_df['Number Injured'] > 0].shape[0] / bikeped_crashes[ped_involved].shape[0]
  injury_rate_bike = bikeped_crashes[bike_involved][crash_df['Number Injured'] > 0].shape[0] / bikeped_crashes[bike_involved].shape[0]
  injury_rate_bike_ped = bikeped_crashes[ped_involved | bike_involved][bikeped_crashes['Number Injured'] > 0].shape[0] / bikeped_crashes[ped_involved | bike_involved].shape[0]


In [30]:
crash_df

Unnamed: 0,Vehicles Involved,Number Injured,Number Dead,Roadway Id,Intersecting Road,Corporate Limits?,Latitude,Longitude,Primary Factor,Manner of Collision,DateTime,Address Number,Cyclist Involved,Pedestrian Involved
0,1.0,0.0,0.0,E 17TH,,Y,39.180909,-86.525593,FAILURE TO YIELD RIGHT OF WAY,LEFT TURN,2022-01-13 11:08:00,,,True
1,1.0,0.0,0.0,E 3RD ST,,Y,39.164783,-86.528648,DRR DISTRACTED - EXPLAIN IN NARRATIVE,OTHER - EXPLAIN IN NARRATIVE,2022-10-13 13:31:00,,,
2,2.0,0.0,0.0,ST RD 46 W,W ARLINGTON,N,39.207648,-86.573792,FOLLOWING TOO CLOSELY,REAR END,2022-01-11 15:19:00,,,
3,2.0,0.0,0.0,N DUNN ST,E KIRKWOOD AVE,Y,39.166561,-86.528263,FAILURE TO YIELD RIGHT OF WAY,RIGHT ANGLE,2022-01-24 04:27:00,,,
4,1.0,0.0,0.0,ST RD 46 W,DEER PARK,Y,39.212153,-86.587526,ANIMAL/OBJECT IN RDWAY,COLLISION WITH DEER,2022-01-08 08:35:00,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76013,,0.0,0.0,DUNN,WHITE LOT WEST,,0.000000,0.000000,IMPROPER LN USAGE,,2003-10-06 17:00:00,,,
76014,,0.0,0.0,RED OAK,ST RD 446,,0.000000,0.000000,UNSAFE SPEED,,2003-11-03 08:00:00,,,
76015,,0.0,0.0,2ND ST,WALNUT,,0.000000,0.000000,BRAKE FAILURE OR DEFECTIVE,,2003-12-05 12:00:00,,,
76016,,0.0,0.0,NINETH,NORTH,,0.000000,0.000000,UNSAFE BACKING,,2003-12-01 07:00:00,,,


In [47]:
# % change from 2010 for diff crash severities
y10 = crash_df['Year'] == 2010
y22 = crash_df['Year'] == 2022
y10_injury_pct =  crash_df[y10][crash_df['Number Injured'] > 0].shape[0] / crash_df[y10].shape[0]
y10_death_pct = crash_df[y10][crash_df['Number Dead'] > 0].shape[0] / crash_df[y10].shape[0]

y22_injury_pct =  crash_df[y22][crash_df['Number Injured'] > 0].shape[0] / crash_df[y22].shape[0]
y22_death_pct = crash_df[y22][crash_df['Number Dead'] > 0].shape[0] / crash_df[y22].shape[0]

'pct change in crash deaths, 2010 - 2022:', (y22_death_pct - y10_death_pct) / y10_death_pct
'pct change in crash injuries, 2010 - 2022:', (y22_injury_pct - y10_injury_pct) / y10_injury_pct

  y10_injury_pct =  crash_df[y10][crash_df['Number Injured'] > 0].shape[0] / crash_df[y10].shape[0]
  y10_death_pct = crash_df[y10][crash_df['Number Dead'] > 0].shape[0] / crash_df[y10].shape[0]
  y22_injury_pct =  crash_df[y22][crash_df['Number Injured'] > 0].shape[0] / crash_df[y22].shape[0]
  y22_death_pct = crash_df[y22][crash_df['Number Dead'] > 0].shape[0] / crash_df[y22].shape[0]


('pct change in crash injuries, 2010 - 2022:', -0.1642369104163384)

In [50]:
y10_injury_pct =  crash_df[y10][ped_involved][crash_df['Number Injured'] > 0].shape[0] / crash_df[y10][ped_involved].shape[0]
y10_death_pct = crash_df[y10][ped_involved][crash_df['Number Dead'] > 0].shape[0] / crash_df[y10][ped_involved].shape[0]

y22_injury_pct =  crash_df[y22][ped_involved][crash_df['Number Injured'] > 0].shape[0] / crash_df[y22][ped_involved].shape[0]
y22_death_pct = crash_df[y22][ped_involved][crash_df['Number Dead'] > 0].shape[0] / crash_df[y22][ped_involved].shape[0]

'pct change in pedestrian crash deaths, 2010 - 2022:', (y22_death_pct - y10_death_pct) / y10_death_pct
'pct change in pedestrian crash injuries, 2010 - 2022:', (y22_injury_pct - y10_injury_pct) / y10_injury_pct


  y10_injury_pct =  crash_df[y10][ped_involved][crash_df['Number Injured'] > 0].shape[0] / crash_df[y10][ped_involved].shape[0]
  y10_death_pct = crash_df[y10][ped_involved][crash_df['Number Dead'] > 0].shape[0] / crash_df[y10][ped_involved].shape[0]
  y22_injury_pct =  crash_df[y22][ped_involved][crash_df['Number Injured'] > 0].shape[0] / crash_df[y22][ped_involved].shape[0]
  y22_death_pct = crash_df[y22][ped_involved][crash_df['Number Dead'] > 0].shape[0] / crash_df[y22][ped_involved].shape[0]


('pct change in pedestrian crash deaths, 2010 - 2022:', 1.206896551724138)

In [277]:
def series_avg(val1,val2):
    return val1 / val2 * 100

In [272]:
bike_ped_crashes = crash_df[((crash_df['Pedestrian Involved'] == True) | 
         (crash_df['Cyclist Involved'] == True)) & 
        (crash_df['Year'] > 2005)].groupby(by="Year").size().reset_index().rename(columns={
    0: "Bike/ped crashes"
})

In [280]:
bike_ped_injuries = crash_df[((crash_df['Pedestrian Involved'] == True) | 
         (crash_df['Cyclist Involved'] == True)) & 
        (crash_df['Number Injured'] > 0) & 
        (crash_df['Year'] > 2005)].groupby(by="Year").size().reset_index().rename(columns={
    0: "Bike/ped injuries"
})

bike_ped_injuries['Total bike/ped crashes'] = bike_ped_crashes['Bike/ped crashes']
bike_ped_injuries['Total crashes'] = crash_df[crash_df['Year'] > 2005].groupby(by="Year").size().reset_index()[0]
bike_ped_injuries['Total injuries'] = crash_df[crash_df['Year'] > 2005][['Year','Number Injured']].groupby(by="Year").sum().reset_index()['Number Injured']
bike_ped_injuries['bike/ped_pct'] = bike_ped_injuries.apply(lambda x: series_avg(x['Bike/ped injuries'],x['Total bike/ped crashes']), axis=1)
bike_ped_injuries['all_injuries_pct'] = bike_ped_injuries.apply(lambda x: series_avg(x['Total injuries'],x['Total crashes']), axis=1)

bike_ped_injuries

Unnamed: 0,Year,Bike/ped injuries,Total bike/ped crashes,Total crashes,Total injuries,bike/ped_pct,all_injuries_pct
0,2006,63,65,3937,1050.0,96.923077,26.670053
1,2007,104,115,4055,941.0,90.434783,23.205919
2,2008,76,89,4341,875.0,85.393258,20.156646
3,2009,78,88,4014,878.0,88.636364,21.873443
4,2010,99,115,4060,926.0,86.086957,22.807882
5,2011,110,118,3914,831.0,93.220339,21.231477
6,2012,111,128,4223,944.0,86.71875,22.353777
7,2013,72,90,3080,936.0,80.0,30.38961
8,2014,97,113,3289,971.0,85.840708,29.522651
9,2015,85,100,4055,1107.0,85.0,27.29963


In [316]:
bike_ped = crash_df[((crash_df['Pedestrian Involved'] == True) | 
         (crash_df['Cyclist Involved'] == True)) & 
        (crash_df['Year'] > 2005)].groupby(by="Year").size().reset_index().rename(columns={
    0: "Total bike/ped crashes"
})

# bike_ped

In [336]:
crash_df[((crash_df['Pedestrian Involved'] == True) | 
         (crash_df['Cyclist Involved'] == True)) &
         (crash_df['Number Dead'] > 0) 
        ].groupby(by='Year',dropna=False).sum()


Unnamed: 0_level_0,Vehicles Involved,Number Injured,Number Dead,Latitude,Longitude,Address Number,Month
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008,0.0,0.0,3.0,117.473638,-259.740053,0.0,20
2009,0.0,0.0,1.0,39.175674,-86.518987,0.0,9
2010,0.0,0.0,1.0,39.102732,-86.576999,0.0,12
2012,0.0,0.0,1.0,39.12145,-86.523827,0.0,12
2014,2.0,0.0,2.0,78.382589,-173.206988,0.0,19
2016,1.0,0.0,1.0,39.20704,-86.534704,0.0,7
2017,1.0,1.0,1.0,39.18723,-86.552182,0.0,6
2018,3.0,0.0,3.0,78.333424,-173.070891,0.0,33
2019,2.0,0.0,2.0,78.363098,-173.001543,0.0,14
2020,4.0,1.0,3.0,78.32888,-173.008496,0.0,23


In [335]:
ped03 = crash_df[(crash_df['Year'] == 2003) & 
                 (crash_df['Number Dead'] > 0) & 
                 (crash_df['Pedestrian Involved'] == True)
                ].sum()
ped03

TypeError: 'numpy.int64' object is not callable

In [291]:
bike_ped_injuries.to_csv('../data/charts/bike-ped-injuries.csv')
bike_ped_deaths.to_csv('../data/charts/bike-ped-deaths.csv')

In [371]:
# injuries per year per 1000 total crashes from bike ped crashes
source = bike_ped_injuries[['Year','bike/ped_pct']]
title='Pedestrian or cyclist injuries for every 100 crashes'
values=years
x_axis='Year:O'
y_axis='bike/ped_pct:Q'

make_line_chart(source, title, values, x_axis, y_axis)

In [338]:
# what are the most dangerous intersections? 

fatal_crashes = crash_df['Number Dead'] > 0
injury_crashes = crash_df['Number Injured'] > 0

crash_df[injury_crashes|fatal_crashes].groupby(by=['Roadway Id','Intersecting Road']).size().sort_values(ascending=False)[:20]

Roadway Id       Intersecting Road
ST RD 37         ST RD 45             92
                 VERNAL PK            68
                 ST RD 48             61
ST RD 45/46      KINSER PK            56
ST RD 37         TAPP RD              55
                 VERNAL               50
ST RD 48-3RD ST  LIBERTY DR           47
N ST RD 37       VERNAL               47
ST RD 37         ST RD 37             40
ST RD 45/46      17TH ST              36
ST RD 48         CURRY PK             34
GRIMES LN        WALNUT ST            30
DILLMAN          ST RD 37             30
ST RD 37         DILLMAN RD           30
ST RD 45/46      10TH ST              29
FULLERTON        S ST RD 37           29
3RD ST           COLLEGE MALL         29
                 EAGLESON AVE         28
ST RD 45/46      WALNUT ST            27
ST RD 48         GATES DR             27
dtype: int64

In [191]:
bikeped_crashes.shape

(63157, 16)

In [193]:
bikeped_crashes[ped_involved | bike_involved].groupby(by=['Roadway Id','Intersecting Road', 'Year']).size().sort_values(ascending=False)[:20]



Roadway Id    Intersecting Road  Year
7TH           EAGLESON AVE       2012    5
  7TH ST      EAGLESON AVE       2013    4
DUNN          KIRKWOOD AVE       2010    3
E 10TH        N SUNRISE DR       2019    3
  2ND ST      COLLEGE AVE        2018    3
KIRKWOOD AVE  WALNUT ST          2013    3
7TH           EAGLESON AVE       2007    3
  3RD ST      WOODLAWN AVE       2015    3
10TH ST       UNION ST           2017    3
EAGLESON AVE  JONES AVE          2017    3
ST RD 45/46   10TH ST            2017    2
  8TH ST      WALNUT ST          2013    2
HILLVIEW      N ST RD 37 RD      2009    2
ST RD 45/46   KINSER PK          2013    2
DUNN          KIRKWOOD AVE       2008    2
ST RD 45/46   KINSER PK          2015    2
                                 2016    2
10TH ST       CAMPBELL ST        2017    2
              EAGLESON AVE       2013    2
ST RD 45/46   KINSER PK          2017    2
dtype: int64

In [249]:
crash_df[crash_df['Primary Factor'].str.contains("DRR")]

Unnamed: 0,Vehicles Involved,Number Injured,Number Dead,Roadway Id,Intersecting Road,Corporate Limits?,Latitude,Longitude,Primary Factor,Manner of Collision,DateTime,Address Number,Cyclist Involved,Pedestrian Involved,Year,Month
1,1.0,0.0,0.0,E 3RD ST,,Y,39.164783,-86.528648,DRR DISTRACTED - EXPLAIN IN NARRATIVE,OTHER - EXPLAIN IN NARRATIVE,2022-10-13 13:31:00,,,,2022,10
18,1.0,0.0,0.0,CURRY PK,ROLL AVE,Y,39.147190,-86.582613,OTHER (DRR) - EXPLAIN IN NARRATIVE,RAN OFF RD,2022-01-13 12:52:00,,,,2022,1
31,2.0,0.0,0.0,S COLLEGE MALL R,,Y,39.153861,-86.495782,OTHER (DRR) - EXPLAIN IN NARRATIVE,OPPOSITE DIRECTION SIDESWIPE,2022-06-10 19:23:00,,,,2022,6
47,2.0,0.0,0.0,N MADISON ST,,Y,39.168457,-86.537424,OTHER (DRR) - EXPLAIN IN NARRATIVE,OPPOSITE DIRECTION SIDESWIPE,2022-08-06 00:00:00,,,,2022,8
61,2.0,1.0,0.0,W 3RD ST,S CURRY PK,Y,39.164651,-86.582897,OTHER (DRR) - EXPLAIN IN NARRATIVE,REAR END,2022-07-16 09:33:00,,,,2022,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75998,,0.0,0.0,8TH,9TH ST,,0.000000,0.000000,OTHER (DRR) - EXPLAIN IN NARRATIVE,,2003-08-06 11:00:00,,,,2003,8
76000,,0.0,0.0,FEE,N SIDE FIELDHOUSE,,0.000000,0.000000,OTHER (DRR) - EXPLAIN IN NARRATIVE,,2003-09-02 12:00:00,,,,2003,9
76005,,0.0,0.0,ST RD 45ST RD 46,WALNUT,,0.000000,0.000000,OTHER (DRR) - EXPLAIN IN NARRATIVE,,2003-09-01 18:00:00,,,,2003,9
76010,,0.0,0.0,IU LIBRARY,EAGLESON AVE,,0.000000,0.000000,OTHER (DRR) - EXPLAIN IN NARRATIVE,,2003-10-03 19:00:00,,,,2003,10


In [349]:
yearly_deaths = crash_df[['Year','Number Dead']].groupby(by='Year').sum()
yearly_deaths['Total Crashes'] = crash_df[['Year']].groupby(by='Year').size()
# how many deaths for every 100 crashes
yearly_deaths['Pct fatal'] = yearly_deaths.apply(lambda x: series_avg(x['Number Dead'],x['Total Crashes']),axis=1)

In [350]:
yearly_deaths.to_csv('../data/charts/annual-fatalities.csv')

In [369]:
yearly_injuries = crash_df[['Year','Number Injured']].groupby(by='Year').sum()
yearly_injuries['Total Crashes'] = crash_df[['Year']].groupby(by='Year').size()
# how many injuries for every 100 crashes
yearly_injuries['Pct injured'] = yearly_injuries.apply(lambda x: series_avg(x['Number Injured'],x['Total Crashes']),axis=1)
yearly_injuries

Unnamed: 0_level_0,Number Injured,Total Crashes,Pct injured
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2003,985.0,4625,21.297297
2004,1230.0,4269,28.812368
2005,1107.0,3967,27.905218
2006,1050.0,3937,26.670053
2007,941.0,4055,23.205919
2008,875.0,4341,20.156646
2009,878.0,4014,21.873443
2010,926.0,4060,22.807882
2011,831.0,3914,21.231477
2012,944.0,4223,22.353777


In [370]:
yearly_injuries.to_csv('../data/charts/annual-injuries.csv')

In [373]:
crash_df[((crash_df['Pedestrian Involved'] == True) |
          crash_df['Cyclist Involved'] == True
         ) & (crash_df['Number Dead'] > 0)]

Unnamed: 0,Vehicles Involved,Number Injured,Number Dead,Roadway Id,Intersecting Road,Corporate Limits?,Latitude,Longitude,Primary Factor,Manner of Collision,DateTime,Address Number,Cyclist Involved,Pedestrian Involved,Year,Month
544,1.0,0.0,1.0,N WALNUT ST,E 12TH ST,Y,39.174286,-86.53357,OTHER (DRR) - EXPLAIN IN NARRATIVE,OTHER - EXPLAIN IN NARRATIVE,2022-09-18 01:49:00,,,True,2022,9
3330,1.0,0.0,1.0,W AIRPORT RD,S KIRBY RD,N,39.136906,-86.61518,DRR DISTRACTED - EXPLAIN IN NARRATIVE,RAN OFF RD,2022-06-10 20:48:00,,,True,2022,6
7000,1.0,0.0,1.0,S WASHINGTON ST,E 3RD ST,,39.16464,-86.5324,FAILURE TO YIELD RIGHT OF WAY,RIGHT TURN,2020-02-08 18:35:00,,,True,2020,2
8391,2.0,1.0,1.0,S INDIANA AVE,E 3RD ST,,,,DISREGARD SIGNAL/REG SIGN,RIGHT ANGLE,2020-09-29 09:30:00,,,True,2020,9
9007,1.0,0.0,1.0,ST RD 46,S PARK RIDGE RD,,39.16424,-86.476096,PEDESTRIAN ACTION,COLLISION WITH OBJECT IN RD,2020-12-21 19:00:00,,,True,2020,12
10162,1.0,0.0,1.0,ST RD 45,N RUSSELL RD,,39.198979,-86.467943,OTHER (DRR) - EXPLAIN IN NARRATIVE,OTHER - EXPLAIN IN NARRATIVE,2019-05-13 22:23:00,,True,True,2019,5
11232,1.0,0.0,1.0,S WALNUT,E 3RD ST,,39.164118,-86.5336,PEDESTRIAN ACTION,RIGHT TURN,2019-09-26 11:44:00,,,True,2019,9
12388,1.0,0.0,1.0,1ST,,,39.177744,-86.539755,UNSAFE BACKING,BACKING CRASH,2018-10-27 20:45:00,,,True,2018,10
14555,1.0,0.0,1.0,3RD ST,RUNKLE,,0.0,0.0,PEDESTRIAN ACTION,OBJECT IN RD,2018-12-21 18:04:00,,,True,2018,12
20157,1.0,0.0,1.0,CURRY PK,BEASLEY,,39.149152,-86.582704,PEDESTRIAN ACTION,HEAD ON BETWEEN TWO MOTOR VEHICLES,2014-11-06 05:06:00,,,True,2014,11
