In [None]:
import pandas as pd
import datetime as dt

In [None]:
crash_data = pd.read_csv('../data/clean/crashes.csv')
crash_data.describe()

In [None]:
crash_data.isna().sum(axis = 0)

Location data is important for the first step of identifying "hot spots", so any nulls in these columns will need to be addressed. The records with nulls for location fields (`lat`, `long`, & `mapped_location`) do seem to be from actual events and not errors, but given that at the time of this analysis they represent 0.04% of the data, I don't believe removeing them will negatively impact the overall analysis.

In [None]:
crash_data = crash_data.dropna()

In [None]:
crash_data.isna().sum(axis = 0)

In [None]:
crash_data.describe()

No fatalities are reported in the entire dataset, which is surprising. However, these reports are only as accurate as the officers recording them, and they may be busy attending to those involved and quickly filling these out as soon as they arrive or or after they've left. That said, I will ignore the column for this analysis but leave it in the dataset, so it can be used in the future if numbers start showing up.

Now let's look at some other data sets that may have an impact on road safety.

First, 311 Complaints:

In [None]:
nash_311 = pd.read_csv('../data/clean/nash_311.csv')
nash_311.info()

In [None]:
nash_311.isna().sum(axis = 0)

In [None]:
#nash_311 = nash_311.dropna(subset=['latitude', 'longitude'])

Short-term rental properties in the area (for the purpose of this study, defined as all properties in zip codes 37206, 37216, & 37115) may inflence driver behavior, as guests staying here are either driving themselves and may not be familiar with their surroundings or are using ride-share services whose drivers also may not be familiar with the area.

In [None]:
st_rentals = pd.read_csv('../data/clean/rentals.csv')
st_rentals.info()

Building permits and active right-of-way permits (construction occuring in the roadway) could cause congestion and unexpected slowdowns, which could lead to crashes.

In [None]:
bdlg_permits = pd.read_csv('../data/clean/bldg_permits.csv')

In [None]:
row_permits = pd.read_csv('../data/clean/row_permits.csv')

Nashville also maintains an inventory of pedestrian signals throughout the city, including information about their compliance with ADA regulations.

In [None]:
ped_inv = pd.read_csv('../data/clean/ped_inv.csv')

Now is a good time to pause and look at the all of this on a map.<br><br>
(This is better done in a separate notebook, so this will serve as a stopping point for this one. The current table will be exported to a .csv file and used in the mapping notebook. Refer to `mapping.ipynb` for the overall map(s) and next steps will follow below)

Now we'll look at the different categorical information in our datasets.

In [None]:
crashes.info()

In [None]:
crashes['collision_type_description'].value_counts()

In [None]:
crashes['weather_description'].value_counts()

As the overwhelming majority of crashes happen with clear or cloudy conditions, weather can be eliminated as a potential factor.

In [None]:
crashes['illumination_description'].value_counts()

Same story for illumination, although it may be worth investigating the `DARK - NOT LIGHTED` instances to see if there is an issue with aparticular area.

In [None]:
crashes['harmfuldescriptions'].value_counts()

Could not find info specific to Tennessee but according to the state of Massachusetts, a "Collision with a motot vehicle in transport" means:<br><br>
"An event where a motor vehicle collides with another motor vehicle which is actively in motion on a roadway. This includes: motor vehicle in traffic on a highway, driverless motor vehicle in motion, motionless motor vehicle abandoned on a roadway, disabled motor vehicle on a roadway, etc."<br>(Source: https://masscrashreportmanual.com/vehicle/sequence-of-events-most-harmful-event/)<br><br>
So this count is simply telling us that the overwhelming majority of crashes involved a vehicle travelling down th eroad, as opposed to one that was not moving.<br><br>
It's worth looking at the collisions with pedestrians and parked vehicles. More on that later.

Now let's look at the different types of 311 complaints that have been made in the area.

In [None]:
nash_311.info()

In [None]:
nash_311['case_request'].value_counts()

Interestingly, there are almost 30,000 entries related to road and sidewalk conditions - over 4 times the number of crashes along the corridor... Let's dig into those.

In [None]:
nash_311[nash_311['case_request'] == 'Streets, Roads & Sidewalks']['case_subrequest'].value_counts()

While there aren't many, several subcategories related to infrastructure issues include requests for improvements. Plotting those on a map and comparing against crash locations will tell us where residents think the problem areas are.

In [None]:
impr_requests = nash_311[nash_311['case_subrequest'].isin(['Request New Sign', 'Traffic Engineering', 'Paving Request', 'Request a Speed Monitor Trailer', 'Traffic Light Timing', 'Request New Signal', 'Traffic Calming', 'Request for a New/Improved Bikeway', 'Request Warning Sign'])]
impr_requests.to_csv(('../data/clean/impr_requests.csv'))
impr_requests.head()

Another consideration is traffic signal equipment and whether or not proper pedestrian facilities exist.

In [None]:
ped_inv = pd.read_csv('../data/clean/ped_inv.csv')
ped_inv.info()

In [None]:
ped_inv

In [None]:
ped_inv = ped_inv[ped_inv['onst'].str.contains('GALLATIN|MAIN')]
ped_inv

Taking this to Tableau and comparing the number of injuries and number of vehicles involved in crashes relative to their locations, a few intersections are jumping out as "hot spots". See the tableau dashboard for more information.