The traffic accident data is the key to this analysis. To make sure it's current, instead of downloading a static .csv file it, along with several other datasets from the City of Nashville's OpenData portal, will be periodically downloaded and updated via APIs.

To avoid too many pull requests, all of the data will be pulled here and the relevant information will be exported for use in a separate notebook for EDA.

** This notebook will be revisited and edited as data needs may change for this analysis.

In [1]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Needed to pip install sodapy first
from sodapy import Socrata 

**Data Source:** https://data.nashville.gov/Police/Traffic-Accidents/6v6w-hpcw/about_data

In [2]:
client = Socrata("data.nashville.gov", None)



In [3]:
crashes_export = client.get("6v6w-hpcw", limit=200000)
crashes_raw = pd.DataFrame.from_records(crashes_export)
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,reporting_officer,collision_type,collision_type_description,weather,...,rpa,precinct,lat,long,mapped_location,:@computed_region_wvby_4s8j,:@computed_region_3aw5_2wv7,:@computed_region_p6sk_2acq,:@computed_region_gxvr_9jxz,property_damage
0,20240113671,2024-02-18T18:00:00.000,2,0,0,False,4008412,5,SIDESWIPE - SAME DIRECTION,21,...,8927,SOUTH,36.0488,-86.613,"{'type': 'Point', 'coordinates': [-86.613, 36....",1,1,33,15,
1,20240113561,2024-02-18T17:38:00.000,2,0,0,False,4007344,4,ANGLE,21,...,5325,NORTH,36.1691,-86.8096,"{'type': 'Point', 'coordinates': [-86.8096, 36...",1,1,25,51,
2,20240113525,2024-02-18T17:23:00.000,1,2,0,False,4008013,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,21,...,9561,HERMIT,36.1798,-86.6134,"{'type': 'Point', 'coordinates': [-86.6134, 36...",1,2,7,44,
3,20240113457,2024-02-18T16:50:00.000,2,0,0,False,4007988,4,ANGLE,21,...,4105,CENTRA,36.1615,-86.776,"{'type': 'Point', 'coordinates': [-86.776, 36....",1,1,20,29,
4,20240113169,2024-02-18T12:49:00.000,2,0,0,False,4004452,11,Front to Rear,21,...,1801,EAST,36.2347,-86.7248,"{'type': 'Point', 'coordinates': [-86.7248, 36...",1,1,15,23,


In [4]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180299 entries, 0 to 180298
Data columns (total 29 columns):
 #   Column                       Non-Null Count   Dtype 
---  ------                       --------------   ----- 
 0   accident_number              180299 non-null  object
 1   date_and_time                180299 non-null  object
 2   number_of_motor_vehicles     180298 non-null  object
 3   number_of_injuries           180299 non-null  object
 4   number_of_fatalities         180299 non-null  object
 5   hit_and_run                  180285 non-null  object
 6   reporting_officer            180288 non-null  object
 7   collision_type               180289 non-null  object
 8   collision_type_description   180289 non-null  object
 9   weather                      174248 non-null  object
 10  weather_description          174248 non-null  object
 11  illuaccidemination           180019 non-null  object
 12  illumination_description     180019 non-null  object
 13  harmfulcodes  

Now that the DataFrame has been created, some columns can be removed as they're not relevant to this analysis.

In [5]:
crashes_raw = crashes_raw.drop(['reporting_officer', 'collision_type', 'illuaccidemination', 'harmfulcodes', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', 'weather'], axis=1)

In [6]:
crashes_raw['date'] = pd.to_datetime(crashes_raw['date_and_time']).dt.date
crashes_raw = crashes_raw.astype({'number_of_motor_vehicles': 'float',
                    'number_of_injuries': 'float',
                    'number_of_fatalities': 'float',
                    'hit_and_run': 'bool',
                    'property_damage': 'bool'})

In [7]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180299 entries, 0 to 180298
Data columns (total 21 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   accident_number             180299 non-null  object 
 1   date_and_time               180299 non-null  object 
 2   number_of_motor_vehicles    180298 non-null  float64
 3   number_of_injuries          180299 non-null  float64
 4   number_of_fatalities        180299 non-null  float64
 5   hit_and_run                 180299 non-null  bool   
 6   collision_type_description  180289 non-null  object 
 7   weather_description         174248 non-null  object 
 8   illumination_description    180019 non-null  object 
 9   harmfuldescriptions         178228 non-null  object 
 10  street_address              180294 non-null  object 
 11  city                        180299 non-null  object 
 12  state                       180298 non-null  object 
 13  zip           

For some initial cleaning, all text fields should be converted to the same case, in this case UPPER

In [8]:
crashes_raw['collision_type_description'] = crashes_raw['collision_type_description'].astype(str).str.upper()
crashes_raw['illumination_description'] = crashes_raw['illumination_description'].astype(str).str.upper()
crashes_raw['harmfuldescriptions'] = crashes_raw['harmfuldescriptions'].astype(str).str.upper()
crashes_raw['street_address'] = crashes_raw['street_address'].astype(str).str.upper()
crashes_raw['city'] = crashes_raw['city'].astype(str).str.upper()
crashes_raw['state'] = crashes_raw['state'].astype(str).str.upper()
crashes_raw['precinct'] = crashes_raw['precinct'].astype(str).str.upper()
crashes_raw['weather_description'] = crashes_raw['weather_description'].astype(str).str.upper()
crashes_raw['property_damage'] = crashes_raw['property_damage'].astype(str).str.upper()

In [9]:
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,collision_type_description,weather_description,illumination_description,harmfuldescriptions,...,city,state,zip,rpa,precinct,lat,long,mapped_location,property_damage,date
0,20240113671,2024-02-18T18:00:00.000,2.0,0.0,0.0,False,SIDESWIPE - SAME DIRECTION,CLEAR,DARK-UNKNOWN LIGHTING,MOTOR VEHICLE IN TRANSPORT,...,ANTIOCH,TN,37013,8927,SOUTH,36.0488,-86.613,"{'type': 'Point', 'coordinates': [-86.613, 36....",True,2024-02-18
1,20240113561,2024-02-18T17:38:00.000,2.0,0.0,0.0,False,ANGLE,CLEAR,DUSK,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37208,5325,NORTH,36.1691,-86.8096,"{'type': 'Point', 'coordinates': [-86.8096, 36...",True,2024-02-18
2,20240113525,2024-02-18T17:23:00.000,1.0,2.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DAYLIGHT,DITCH;RAN OFF ROAD-RIGHT,...,HERMITAGE,TN,37076,9561,HERMIT,36.1798,-86.6134,"{'type': 'Point', 'coordinates': [-86.6134, 36...",True,2024-02-18
3,20240113457,2024-02-18T16:50:00.000,2.0,0.0,0.0,False,ANGLE,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37201,4105,CENTRA,36.1615,-86.776,"{'type': 'Point', 'coordinates': [-86.776, 36....",True,2024-02-18
4,20240113169,2024-02-18T12:49:00.000,2.0,0.0,0.0,False,FRONT TO REAR,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37216,1801,EAST,36.2347,-86.7248,"{'type': 'Point', 'coordinates': [-86.7248, 36...",True,2024-02-18


Last, this analysis is only concerned with crashes in the vicinity of Main Street / Gallatin Avenue / Gallatin Pike in East Nashville & Madison. To filter the data, a list of points needs to be created to essentially trace the path of the corridor. From there, only data points within, say, a 150' radius of that line will be included for analysis. I can't find a better way to do this than selecting points on Google Maps and manually entering the coordinates into a list. Refer to `mapping.ipynb`

In [10]:
east_nash_crashes = crashes_raw[crashes_raw['zip'].isin(['37206', '37216', '37115'])]
east_nash_crashes

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,collision_type_description,weather_description,illumination_description,harmfuldescriptions,...,city,state,zip,rpa,precinct,lat,long,mapped_location,property_damage,date
4,20240113169,2024-02-18T12:49:00.000,2.0,0.0,0.0,False,FRONT TO REAR,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37216,1801,EAST,36.2347,-86.7248,"{'type': 'Point', 'coordinates': [-86.7248, 36...",TRUE,2024-02-18
15,20240112411,2024-02-18T00:51:00.000,2.0,0.0,0.0,False,ANGLE,CLEAR,DARK - LIGHTED,MOTOR VEHICLE IN TRANSPORT,...,MADISON,TN,37115,1701,MADISO,36.2631,-86.7119,"{'type': 'Point', 'coordinates': [-86.7119, 36...",TRUE,2024-02-18
19,20240112142,2024-02-17T20:49:00.000,1.0,0.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DARK - LIGHTED,NAN,...,MADISON,TN,37115,1701,MADISO,36.2638,-86.7117,"{'type': 'Point', 'coordinates': [-86.7117, 36...",TRUE,2024-02-17
31,20240111599,2024-02-17T18:11:00.000,2.0,0.0,0.0,True,FRONT TO REAR,CLEAR,DARK - LIGHTED,MOTOR VEHICLE IN TRANSPORT,...,MADISON,TN,37115,2017,MADISO,36.2563,-86.7579,"{'type': 'Point', 'coordinates': [-86.7579, 36...",TRUE,2024-02-17
63,20240110010,2024-02-16T21:36:00.000,2.0,0.0,0.0,False,ANGLE,RAIN,DARK - NOT LIGHTED,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37206,1999,EAST,36.1764,-86.7594,"{'type': 'Point', 'coordinates': [-86.7594, 36...",TRUE,2024-02-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180210,20170001270,2017-01-01T14:59:00.000,2.0,0.0,0.0,False,FRONT TO REAR,RAIN,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,...,NASHVILLE,TN,37206,1925,EAST,36.1847,-86.7583,"{'type': 'Point', 'coordinates': [-86.7583, 36...",TRUE,2017-01-01
180219,20170001226,2017-01-01T14:33:00.000,2.0,2.0,0.0,False,FRONT TO REAR,RAIN,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,...,MADISON,TN,37115,1713,MADISO,36.2721,-86.6890,"{'type': 'Point', 'coordinates': [-86.689, 36....",TRUE,2017-01-01
180244,20170000705,2017-01-01T07:59:00.000,1.0,2.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DAYLIGHT,GUARDRAIL FACE,...,MADISON,TN,37115,20044,MADISO,36.2481,-86.7430,"{'type': 'Point', 'coordinates': [-86.743, 36....",TRUE,2017-01-01
180258,20170000450,2017-01-01T03:47:00.000,1.0,0.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DARK - LIGHTED,WALL,...,NASHVILLE,TN,37216,1449,EAST,36.2095,-86.7135,"{'type': 'Point', 'coordinates': [-86.7135, 36...",TRUE,2017-01-01


The resulting table is exported to a .csv file for use in the EDA notebook.

In [11]:
east_nash_crashes.to_csv('../data/clean/east_nash_crashes.csv')

Now repeat the process for active ROW permits.<br><br>
Source: https://data.nashville.gov/Licenses-Permits/Active-Right-of-Way-Permits/a5tp-4w2v/about_data

In [12]:
row_permits_export = client.get("a5tp-4w2v", limit=2000000)
row_permits_raw = pd.DataFrame.from_records(row_permits_export)
row_permits_raw.head()

Unnamed: 0,initiated_date,permit,permit_type,permit_description,status,on_street,from_street,to_street,location_address,city,...,company,days_to_work,latitude,longitude,mapped_location,:@computed_region_wvby_4s8j,:@computed_region_3aw5_2wv7,:@computed_region_p6sk_2acq,:@computed_region_gxvr_9jxz,:@computed_region_b9k3_hpc2
0,2023-02-09T11:47:34.000,202305185,DRIVEWAY PERMIT,POURING ENTRANCE AND APRON TO DRIVEWAY,ACTIVE,4108 WESTLAWN DR,WESTLAWN PL,WESTLAWN CT,4108 WESTLAWN DR,NASHVILLE,...,Christy Conway,Monday Tuesday Wednesday Thursday Friday Satur...,36.1393235314126,-86.836318688412,"{'type': 'Point', 'coordinates': [-86.83631868...",1,1,10,46,5
1,2023-06-02T11:01:59.000,202320188,DRIVEWAY PERMIT,RESIDENTIAL DRIVEWAY CONSTRUCTION,ACTIVE,0 KNIGHT DR,MID-BLOCK,JUDY CREEK RD,0 KNIGHT DR,WHITES CREEK,...,Michael L Shular,Monday Tuesday Wednesday Thursday Friday Satur...,36.2626428157731,-86.8253323554335,"{'type': 'Point', 'coordinates': [-86.82533235...",1,2,16,6,1
2,2022-10-06T09:36:29.000,202236666,EXCAVATION PERMIT,STORMWATER INSTALL,ACTIVE,KINGS LN,PHIPPS DR,DRAKES BRANCH RD,KINGS LN,NASHVILLE,...,Middle Tennessee Infrastructure,Tuesday Wednesday Thursday Friday Saturday,36.2175270092533,-86.846531521226,"{'type': 'Point', 'coordinates': [-86.84653152...",1,1,1,34,1
3,2023-04-13T09:09:13.000,202313772,EXCAVATION PERMIT,SEWER/WATER TAP,ACTIVE,4317 CATO RD,CATO RD,GILMORE CROSSING LN,4317 CATO RD,NASHVILLE,...,"R & A PLUMBING, LLC",Monday Tuesday Wednesday Thursday Friday Satur...,36.2138889523239,-86.8666383650662,"{'type': 'Point', 'coordinates': [-86.86663836...",1,1,1,34,1
4,2023-05-15T10:27:04.000,202317822,EXCAVATION PERMIT,PLACING ANCHOR\nJOB# A02G791 (NN)\nSTEP: 1.7,ACTIVE,META DR,WESTCREST DR,ELLENWOOD DR,META DR,NASHVILLE,...,STAR CONSTRUCTION LLC,Tuesday Wednesday Thursday Friday Saturday,36.0594833908917,-86.7349955972504,"{'type': 'Point', 'coordinates': [-86.73499559...",1,1,21,3,3


In [13]:
row_permits_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 27 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   initiated_date               25 non-null     object
 1   permit                       25 non-null     object
 2   permit_type                  25 non-null     object
 3   permit_description           25 non-null     object
 4   status                       25 non-null     object
 5   on_street                    25 non-null     object
 6   from_street                  25 non-null     object
 7   to_street                    25 non-null     object
 8   location_address             25 non-null     object
 9   city                         25 non-null     object
 10  zip                          25 non-null     object
 11  scope                        25 non-null     object
 12  scheduled_start              25 non-null     object
 13  scheduled_end                25 non-n

In [14]:
row_permits_raw = row_permits_raw.drop(['status', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', ':@computed_region_b9k3_hpc2'], axis=1)

In [15]:
row_permits_raw['initiated_date'] = pd.to_datetime(row_permits_raw['initiated_date'])
row_permits_raw['scheduled_start'] = pd.to_datetime(row_permits_raw['scheduled_start'])
row_permits_raw['scheduled_end'] = pd.to_datetime(row_permits_raw['scheduled_end'])

In [16]:
row_permits = row_permits_raw[row_permits_raw['zip'].isin(['37206', '37216', '37115'])]
row_permits

Unnamed: 0,initiated_date,permit,permit_type,permit_description,on_street,from_street,to_street,location_address,city,zip,...,scheduled_start,scheduled_end,permit_applicant_name,council_district,conditions_traffic_control,company,days_to_work,latitude,longitude,mapped_location
6,2023-07-17 08:29:54,202325417,EXCAVATION PERMIT,SEWER TAP IN ALLEY FOR 1132-B CAHAL AVE,1017 ALY,FOOTPATH,N 16TH ST,1017 ALY,NASHVILLE,37206,...,2023-11-29,2024-02-27,TAMMY GREEN,7,Lane closed maintain two-way traffic Daylight...,PRECISION PLUMBING CO,Monday Tuesday Wednesday Thursday Friday Satur...,36.1956076228412,-86.7389212643167,"{'type': 'Point', 'coordinates': [-86.73892126..."
22,2023-07-17 08:26:53,202325416,EXCAVATION PERMIT,SEWER TAP IN ALLEY FOR 1132-A CAHAL AVE,1017 ALY,FOOTPATH,N 16TH ST,1017 ALY,NASHVILLE,37206,...,2023-11-29,2024-02-27,TAMMY GREEN,7,Lane closed maintain two-way traffic Daylight...,PRECISION PLUMBING CO,Monday Tuesday Wednesday Thursday Friday Satur...,36.1956076228412,-86.7389212643167,"{'type': 'Point', 'coordinates': [-86.73892126..."


In [17]:
row_permits.to_csv('../data/clean/row_permits.csv')

...and Nashville 311 calls<br><br>
Source: https://data.nashville.gov/Public-Services/hubNashville-311-Service-Requests/7qhx-rexh/about_data

In [18]:
nash_311_export = client.get("7qhx-rexh", limit=2000000)
nash_311_raw = pd.DataFrame.from_records(nash_311_export)
nash_311_raw.head()

In [None]:
nash_311_raw.info()

In [None]:
nash_311_raw = nash_311_raw.drop(['oem_id', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', ':@computed_region_yf9r_ed6g', ':@computed_region_fvtq_wnma', ':@computed_region_s8bq_67w7', ':@computed_region_v67z_xm3t', ':@computed_region_kh5x_g7w5', ':@computed_region_cfa7_hbpz', ':@computed_region_sjpq_96s8', ':@computed_region_gisn_y5cm', ':@computed_region_b9k3_hpc2'], axis=1)

In [None]:
nash_311_raw['date_time_opened'] = pd.to_datetime(nash_311_raw['date_time_opened'])
nash_311_raw['date_time_closed'] = pd.to_datetime(nash_311_raw['date_time_closed'])

In [None]:
nash_311 = nash_311_raw[nash_311_raw['incident_zip_code'].isin(['37206', '37216', '37115'])]
nash_311

In [None]:
nash_311.to_csv('../data/clean/nash_311.csv')

...and building permits<br><br>
Source: https://data.nashville.gov/Licenses-Permits/Building-Permits-Issued/3h5w-q8b7/about_data

In [None]:
bdlg_permits_export = client.get("3h5w-q8b7", limit=2000000)
bdlg_permits_raw = pd.DataFrame.from_records(bdlg_permits_export)
bdlg_permits_raw.head()

In [None]:
bdlg_permits_raw.info()

In [None]:
bdlg_permits_raw = bdlg_permits_raw.drop([':@computed_region_f73m_vb2k', ':@computed_region_cfa7_hbpz', ':@computed_region_gisn_y5cm', ':@computed_region_v3ji_vzam', ':@computed_region_c9xn_skx3', ':@computed_region_sjpq_96s8', ':@computed_region_kh5x_g7w5', ':@computed_region_yf9r_ed6g', ':@computed_region_fvtq_wnma', ':@computed_region_p6sk_2acq', ':@computed_region_b9k3_hpc2', ':@computed_region_gxvr_9jxz'], axis=1)

In [None]:
bdlg_permits_raw['date_entered'] = pd.to_datetime(bdlg_permits_raw['date_entered'])
bdlg_permits_raw['date_issued'] = pd.to_datetime(bdlg_permits_raw['date_issued'])

In [None]:
bdlg_permits = bdlg_permits_raw[bdlg_permits_raw['zip'].isin(['37206', '37216', '37115'])]
bdlg_permits

In [None]:
bdlg_permits.to_csv('../data/clean/bldg_permits.csv')

...and Short Term Rental Permits<br><br>
Source: https://data.nashville.gov/Licenses-Permits/Residential-Short-Term-Rental-Permits/2z82-v8pm/about_data

In [None]:
str_export = client.get("2z82-v8pm", limit=100000)
strs_raw = pd.DataFrame.from_records(str_export)
strs_raw.info()

In [None]:
strs_raw = strs_raw.drop([':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_cfa7_hbpz', ':@computed_region_sjpq_96s8', ':@computed_region_f73m_vb2k', ':@computed_region_c9xn_skx3', ':@computed_region_gisn_y5cm', ':@computed_region_v3ji_vzam', ':@computed_region_kdqx_a6fv'], axis=1)

In [None]:
strs_raw['date_entered'] = pd.to_datetime(strs_raw['date_entered'])
strs_raw['date_issued'] = pd.to_datetime(strs_raw['date_issued'])
strs_raw['expiration_date'] = pd.to_datetime(strs_raw['expiration_date'])

In [None]:
rentals = strs_raw[strs_raw['zip'].isin(['37206', '37216', '37115'])]
rentals.head()

In [None]:
rentals.to_csv('../data/clean/rentals.csv')