In [1]:
import pandas as pd
import datetime as dt

In [2]:
crashes = pd.read_csv('../data/clean/crashes.csv')
crashes.describe()

Unnamed: 0.1,Unnamed: 0,accident_number,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,zip,rpa,lat,long
count,7255.0,7255.0,7255.0,7255.0,7255.0,7251.0,7251.0,7250.0,7250.0
mean,90779.89235,20195230000.0,1.825086,0.441902,0.0,37154.375672,1606.365329,36.238894,-86.724281
std,52031.087142,19831810.0,0.737644,0.805726,0.0,48.20615,577.562022,0.042792,0.022824
min,41.0,20170000000.0,0.0,0.0,0.0,37013.0,1101.0,36.0733,-86.8177
25%,45370.5,20180350000.0,2.0,0.0,0.0,37115.0,1425.0,36.197,-86.7416
50%,91194.0,20190590000.0,2.0,0.0,0.0,37115.0,1607.0,36.2487,-86.7199
75%,136184.0,20210420000.0,2.0,1.0,0.0,37206.0,1731.0,36.2644,-86.7115
max,180508.0,20240130000.0,7.0,8.0,0.0,37228.0,15921.0,36.3196,-86.4438


In [3]:
crashes.isna().sum(axis = 0)

Unnamed: 0                    0
accident_number               0
date_and_time                 0
number_of_motor_vehicles      0
number_of_injuries            0
number_of_fatalities          0
hit_and_run                   0
collision_type_description    0
weather_description           0
illumination_description      0
harmfuldescriptions           0
street_address                0
city                          0
state                         0
zip                           4
rpa                           4
precinct                      0
lat                           5
long                          5
mapped_location               5
property_damage               0
date                          0
dtype: int64

Location data is important for the first step of identifying "hot spots", so any nulls in these columns will need to be addressed. The records with nulls for location fields (`lat`, `long`, & `mapped_location`) do seem to be from actual events and not errors, but given that at the time of this analysis they represent 0.04% of the data, I don't believe removeing them will negatively impact the overall analysis.

In [4]:
crashes = crashes.dropna()

In [5]:
crashes.isna().sum(axis = 0)

Unnamed: 0                    0
accident_number               0
date_and_time                 0
number_of_motor_vehicles      0
number_of_injuries            0
number_of_fatalities          0
hit_and_run                   0
collision_type_description    0
weather_description           0
illumination_description      0
harmfuldescriptions           0
street_address                0
city                          0
state                         0
zip                           0
rpa                           0
precinct                      0
lat                           0
long                          0
mapped_location               0
property_damage               0
date                          0
dtype: int64

In [6]:
crashes.to_csv('../data/clean/crashes.csv')

No fatalities are reported in the entire dataset, which is surprising. However, these reports are only as accurate as the officers recording them, and they may be busy attending to those involved and quickly filling these out as soon as they arrive or or after they've left. That said, I will ignore the column for this analysis but leave it in the dataset, so it can be used in the future if numbers start showing up.

Now let's look at some other data sets that may have an impact on road safety.

First, 311 Complaints:

In [7]:
nash_311 = pd.read_csv('../data/clean/nash_311.csv')
nash_311.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11118 entries, 0 to 11117
Data columns (total 21 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Unnamed: 0                 11118 non-null  int64  
 1   case_number                11118 non-null  int64  
 2   status                     11118 non-null  object 
 3   case_request               11118 non-null  object 
 4   case_subrequest            11118 non-null  object 
 5   additional_subrequest      11118 non-null  object 
 6   date_time_opened           11118 non-null  object 
 7   date_time_closed           10994 non-null  object 
 8   case_origin                11118 non-null  object 
 9   state_issue                11118 non-null  bool   
 10  closed_when_created        11118 non-null  bool   
 11  incident_address           11118 non-null  object 
 12  incident_city              11118 non-null  object 
 13  incident_council_district  11118 non-null  obj

In [8]:
nash_311.isna().sum(axis = 0)

Unnamed: 0                     0
case_number                    0
status                         0
case_request                   0
case_subrequest                0
additional_subrequest          0
date_time_opened               0
date_time_closed             124
case_origin                    0
state_issue                    0
closed_when_created            0
incident_address               0
incident_city                  0
incident_council_district      0
incident_zip_code            204
latitude                      31
longitude                     31
mapped_location               31
contact_type                   0
parent_case                    0
preferred_language             0
dtype: int64

In [9]:
nash_311 = nash_311.dropna(subset=['latitude', 'longitude'])

In [10]:
nash_311.to_csv('../data/clean/nash_311.csv')

Short-term rental properties in the area (for the purpose of this study, defined as all properties in zip codes 37206, 37216, & 37115) may inflence driver behavior, as guests staying here are either driving themselves and may not be familiar with their surroundings or are using ride-share services whose drivers also may not be familiar with the area.

In [11]:
st_rentals = pd.read_csv('../data/clean/rentals.csv')
st_rentals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2872 entries, 0 to 2871
Data columns (total 28 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   Unnamed: 0                  2872 non-null   int64 
 1   permit                      2872 non-null   object
 2   applicant                   2872 non-null   object
 3   contact                     2871 non-null   object
 4   permit_subtype_description  2872 non-null   object
 5   permit_status               2872 non-null   object
 6   parcel                      2872 non-null   object
 7   date_entered                2872 non-null   object
 8   date_issued                 2565 non-null   object
 9   expiration_date             2872 non-null   object
 10  address                     2872 non-null   object
 11  city                        2872 non-null   object
 12  state                       2872 non-null   object
 13  zip                         2872 non-null   int6

Building permits and active right-of-way permits (construction occuring in the roadway) could cause congestion and unexpected slowdowns, which could lead to crashes.

In [12]:
bdlg_permits = pd.read_csv('../data/clean/bldg_permits.csv')

In [13]:
row_permits = pd.read_csv('../data/clean/row_permits.csv')

Nashville also maintains an inventory of pedestrian signals throughout the city, including information about their compliance with ADA regulations.

In [14]:
ped_inv = pd.read_csv('../data/clean/ped_inv.csv')

Now is a good time to pause and look at the all of this on a map.<br><br>
(This is better done in a separate notebook, so this will serve as a stopping point for this one. The current table will be exported to a .csv file and used in the mapping notebook. Refer to `mapping.ipynb` for the overall map(s) and next steps will follow below)

Now we'll look at the different categorical information in our datasets.

In [15]:
crashes.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7249 entries, 0 to 7254
Data columns (total 22 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Unnamed: 0                  7249 non-null   int64  
 1   accident_number             7249 non-null   int64  
 2   date_and_time               7249 non-null   object 
 3   number_of_motor_vehicles    7249 non-null   float64
 4   number_of_injuries          7249 non-null   float64
 5   number_of_fatalities        7249 non-null   float64
 6   hit_and_run                 7249 non-null   bool   
 7   collision_type_description  7249 non-null   object 
 8   weather_description         7249 non-null   object 
 9   illumination_description    7249 non-null   object 
 10  harmfuldescriptions         7249 non-null   object 
 11  street_address              7249 non-null   object 
 12  city                        7249 non-null   object 
 13  state                       7249 non-n

In [16]:
crashes['collision_type_description'].value_counts()

collision_type_description
ANGLE                                      2568
FRONT TO REAR                              2399
SIDESWIPE - SAME DIRECTION                 1029
NOT COLLISION W/MOTOR VEHICLE-TRANSPORT     658
HEAD-ON                                     211
SIDESWIPE - OPPOSITE DIRECTION              163
OTHER                                        90
REAR TO SIDE                                 57
UNKNOWN                                      40
REAR-TO-REAR                                 34
Name: count, dtype: int64

In [17]:
crashes['weather_description'].value_counts()

weather_description
CLEAR                     4959
CLOUDY                    1302
RAIN                       683
NAN                        238
SNOW                        28
UNKNOWN                     17
FOG                         12
OTHER (NARRATIVE)            4
SLEET, HAIL                  2
SEVERE CROSSWIND             1
BLOWING SAND/SOIL/DIRT       1
BLOWING SNOW                 1
SMOG, SMOKE                  1
Name: count, dtype: int64

As the overwhelming majority of crashes happen with clear or cloudy conditions, weather can be eliminated as a potential factor.

In [18]:
crashes['illumination_description'].value_counts()

illumination_description
DAYLIGHT                 4970
DARK - LIGHTED           1865
DUSK                      167
DARK - NOT LIGHTED        161
DAWN                       34
DARK-UNKNOWN LIGHTING      18
UNKNOWN                    14
NAN                        11
OTHER                       9
Name: count, dtype: int64

Same story for illumination, although it may be worth investigating the `DARK - NOT LIGHTED` instances to see if there is an issue with aparticular area.

In [19]:
crashes['harmfuldescriptions'].value_counts()

harmfuldescriptions
MOTOR VEHICLE IN TRANSPORT                                                                      6060
PARKED MOTOR VEHICLE                                                                             195
PEDESTRIAN                                                                                       183
UTILITY POLE                                                                                      98
MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE                                                   96
                                                                                                ... 
MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE;OTHER FIXED OBJECTS                                1
MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE;DITCH;OTHER FIXED OBJECTS;RAN OFF ROAD-RIGHT       1
PARKED MOTOR VEHICLE;LUMINAIRE/LIGHT SUPPORT                                                       1
OTHER OBJECT (NOT FIXED);FENCE                                         

Could not find info specific to Tennessee but according to the state of Massachusetts, a "Collision with a motot vehicle in transport" means:<br><br>
"An event where a motor vehicle collides with another motor vehicle which is actively in motion on a roadway. This includes: motor vehicle in traffic on a highway, driverless motor vehicle in motion, motionless motor vehicle abandoned on a roadway, disabled motor vehicle on a roadway, etc."<br>(Source: https://masscrashreportmanual.com/vehicle/sequence-of-events-most-harmful-event/)<br><br>
So this count is simply telling us that the overwhelming majority of crashes involved a vehicle travelling down th eroad, as opposed to one that was not moving.<br><br>
It's worth looking at the collisions with pedestrians and parked vehicles. More on that later.

Now let's look at the different types of 311 complaints that have been made in the area.

In [20]:
nash_311.info()

<class 'pandas.core.frame.DataFrame'>
Index: 11087 entries, 0 to 11117
Data columns (total 21 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Unnamed: 0                 11087 non-null  int64  
 1   case_number                11087 non-null  int64  
 2   status                     11087 non-null  object 
 3   case_request               11087 non-null  object 
 4   case_subrequest            11087 non-null  object 
 5   additional_subrequest      11087 non-null  object 
 6   date_time_opened           11087 non-null  object 
 7   date_time_closed           10966 non-null  object 
 8   case_origin                11087 non-null  object 
 9   state_issue                11087 non-null  bool   
 10  closed_when_created        11087 non-null  bool   
 11  incident_address           11087 non-null  object 
 12  incident_city              11087 non-null  object 
 13  incident_council_district  11087 non-null  object 


In [21]:
nash_311['case_request'].value_counts()

case_request
STREETS, ROADS & SIDEWALKS                4102
PUBLIC SAFETY                             3375
COVID-19                                   940
TRASH, RECYCLING & LITTER                  913
PROPERTY VIOLATIONS                        772
TRANSIT                                    424
RESOLVED BY HUBNASHVILLE ON FIRST CALL     198
ELECTRIC & WATER GENERAL                   136
OTHER METRO SERVICES AND FORMS              85
PERMITS                                     41
HANDS ON VOLUNTEERS                         40
TREES                                       22
PARKS                                       18
OTHER                                        8
PLANNING & ZONING                            6
STORM RELIEF                                 3
EDUCATION & LIBRARIES                        2
PUBLIC RECORDS REQUEST                       1
SOCIAL SERVICES & HOUSING                    1
Name: count, dtype: int64

Interestingly, there are almost 30,000 entries related to road and sidewalk conditions - over 4 times the number of crashes along the corridor... Let's dig into those.

In [22]:
nash_311[nash_311['case_request'] == 'Streets, Roads & Sidewalks']['case_subrequest'].value_counts()

Series([], Name: count, dtype: int64)

While there aren't many, several subcategories related to infrastructure issues include requests for improvements. Plotting those on a map and comparing against crash locations will tell us where residents think the problem areas are.

In [23]:
impr_requests = nash_311[nash_311['case_subrequest'].isin(['Request New Sign', 'Traffic Engineering', 'Paving Request', 'Request a Speed Monitor Trailer', 'Traffic Light Timing', 'Request New Signal', 'Traffic Calming', 'Request for a New/Improved Bikeway', 'Request Warning Sign'])]
impr_requests.to_csv(('../data/clean/impr_requests.csv'))
impr_requests.head()

Unnamed: 0.1,Unnamed: 0,case_number,status,case_request,case_subrequest,additional_subrequest,date_time_opened,date_time_closed,case_origin,state_issue,...,incident_address,incident_city,incident_council_district,incident_zip_code,latitude,longitude,mapped_location,contact_type,parent_case,preferred_language


Another consideration is traffic signal equipment and whether or not proper pedestrian facilities exist.

In [24]:
ped_inv = pd.read_csv('../data/clean/ped_inv.csv')
ped_inv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239 entries, 0 to 238
Data columns (total 29 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       239 non-null    int64  
 1   intersectionid   239 non-null    int64  
 2   onst             239 non-null    object 
 3   crst             239 non-null    object 
 4   quad             239 non-null    object 
 5   inspector        239 non-null    object 
 6   date             239 non-null    object 
 7   evnt_lat         238 non-null    float64
 8   evnt_lon         238 non-null    float64
 9   evnt_map         239 non-null    object 
 10  evnt_type        148 non-null    object 
 11  ped_signal_pres  239 non-null    bool   
 12  oa_cmnt          239 non-null    object 
 13  mapped_location  238 non-null    object 
 14  cpl_size         148 non-null    object 
 15  cpl_force        148 non-null    object 
 16  cpl_op           148 non-null    object 
 17  cpl_ctrst       

Taking this to Tableau and comparing the number of injuries and number of vehicles involved in crashes relative to their locations, a few intersections are jumping out as "hot spots". See the tableau dashboard for more information.