The traffic accident data is the key to this analysis. To make sure it's current, instead of downloading a static .csv file it, along with several other datasets from the City of Nashville's OpenData portal, will be periodically downloaded and updated via APIs.

To avoid too many pull requests, all of the data will be pulled here and the relevant information will be exported for use in a separate notebook for EDA.

** This notebook will be revisited and edited as data needs may change for this analysis.

In [1]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Needed to pip install sodapy first
from sodapy import Socrata 

**Data Source:** https://data.nashville.gov/Police/Traffic-Accidents/6v6w-hpcw/about_data

In [2]:
client = Socrata("data.nashville.gov", None)



In [3]:
crashes_export = client.get("6v6w-hpcw", limit=200000)
crashes_raw = pd.DataFrame.from_records(crashes_export)
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,reporting_officer,collision_type,collision_type_description,weather,...,rpa,precinct,lat,long,mapped_location,:@computed_region_wvby_4s8j,:@computed_region_3aw5_2wv7,:@computed_region_p6sk_2acq,:@computed_region_gxvr_9jxz,property_damage
0,20240101761,2024-02-13T00:00:00.000,1,0,0,True,4004348,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,21,...,3023,NORTH,36.2316,-86.804,"{'type': 'Point', 'coordinates': [-86.804, 36....",1,1,2,16,
1,20240101440,2024-02-13T20:21:00.000,1,1,0,False,352172,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,22,...,4811,WEST,36.0755,-86.9404,"{'type': 'Point', 'coordinates': [-86.9404, 36...",1,2,31,32,
2,20240101201,2024-02-13T18:38:00.000,2,2,0,False,4001162,11,Front to Rear,21,...,20044,MADISO,36.2481,-86.743,"{'type': 'Point', 'coordinates': [-86.743, 36....",1,1,3,12,
3,20240100968,2024-02-13T16:07:00.000,2,0,0,False,4007306,11,Front to Rear,21,...,9041,HERMIT,36.144,-86.7013,"{'type': 'Point', 'coordinates': [-86.7013, 36...",1,1,8,11,
4,20240100938,2024-02-13T16:30:00.000,2,1,0,False,4007829,4,ANGLE,21,...,3135,NORTH,36.1935,-86.8309,"{'type': 'Point', 'coordinates': [-86.8309, 36...",1,1,2,34,


In [4]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180073 entries, 0 to 180072
Data columns (total 29 columns):
 #   Column                       Non-Null Count   Dtype 
---  ------                       --------------   ----- 
 0   accident_number              180073 non-null  object
 1   date_and_time                180073 non-null  object
 2   number_of_motor_vehicles     180072 non-null  object
 3   number_of_injuries           180073 non-null  object
 4   number_of_fatalities         180073 non-null  object
 5   hit_and_run                  180059 non-null  object
 6   reporting_officer            180062 non-null  object
 7   collision_type               180063 non-null  object
 8   collision_type_description   180063 non-null  object
 9   weather                      174027 non-null  object
 10  weather_description          174027 non-null  object
 11  illuaccidemination           179795 non-null  object
 12  illumination_description     179795 non-null  object
 13  harmfulcodes  

Now that the DataFrame has been created, some columns can be removed as they're not relevant to this analysis.

In [5]:
crashes_raw = crashes_raw.drop(['reporting_officer', 'collision_type', 'illuaccidemination', 'harmfulcodes', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', 'weather'], axis=1)

In [6]:
crashes_raw['date_and_time'] = pd.to_datetime(crashes_raw['date_and_time'])
crashes_raw = crashes_raw.astype({'number_of_motor_vehicles': 'float',
                    'number_of_injuries': 'float',
                    'number_of_fatalities': 'float',
                    'hit_and_run': 'bool',
                    'property_damage': 'bool'})

In [7]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180073 entries, 0 to 180072
Data columns (total 20 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   accident_number             180073 non-null  object        
 1   date_and_time               180073 non-null  datetime64[ns]
 2   number_of_motor_vehicles    180072 non-null  float64       
 3   number_of_injuries          180073 non-null  float64       
 4   number_of_fatalities        180073 non-null  float64       
 5   hit_and_run                 180073 non-null  bool          
 6   collision_type_description  180063 non-null  object        
 7   weather_description         174027 non-null  object        
 8   illumination_description    179795 non-null  object        
 9   harmfuldescriptions         178006 non-null  object        
 10  street_address              180068 non-null  object        
 11  city                        180073 non-

For some initial cleaning, all text fields should be converted to the same case, in this case UPPER

In [8]:
crashes_raw['collision_type_description'] = crashes_raw['collision_type_description'].astype(str).str.upper()
crashes_raw['illumination_description'] = crashes_raw['illumination_description'].astype(str).str.upper()
crashes_raw['harmfuldescriptions'] = crashes_raw['harmfuldescriptions'].astype(str).str.upper()
crashes_raw['street_address'] = crashes_raw['street_address'].astype(str).str.upper()
crashes_raw['city'] = crashes_raw['city'].astype(str).str.upper()
crashes_raw['state'] = crashes_raw['state'].astype(str).str.upper()
crashes_raw['precinct'] = crashes_raw['precinct'].astype(str).str.upper()
crashes_raw['weather_description'] = crashes_raw['weather_description'].astype(str).str.upper()
crashes_raw['property_damage'] = crashes_raw['property_damage'].astype(str).str.upper()

In [9]:
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,collision_type_description,weather_description,illumination_description,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location,property_damage
0,20240101761,2024-02-13 00:00:00,1.0,0.0,0.0,True,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DARK - LIGHTED,DITCH,KNIGHT DR & EWING DR,NASHVILLE,TN,37207,3023,NORTH,36.2316,-86.804,"{'type': 'Point', 'coordinates': [-86.804, 36....",True
1,20240101440,2024-02-13 20:21:00,1.0,1.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLOUDY,DARK - NOT LIGHTED,DITCH,HIGHWAY 70S & BELLEVUE RD,NASHVILLE,TN,37221,4811,WEST,36.0755,-86.9404,"{'type': 'Point', 'coordinates': [-86.9404, 36...",True
2,20240101201,2024-02-13 18:38:00,2.0,2.0,0.0,False,FRONT TO REAR,CLEAR,DARK - LIGHTED,MOTOR VEHICLE IN TRANSPORT,I65 S EXT RAMP & I 65,MADISON,TN,37115,20044,MADISO,36.2481,-86.743,"{'type': 'Point', 'coordinates': [-86.743, 36....",True
3,20240100968,2024-02-13 16:07:00,2.0,0.0,0.0,False,FRONT TO REAR,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,I 40 & UNKNOWN RAMP,NASHVILLE,TN,37210,9041,HERMIT,36.144,-86.7013,"{'type': 'Point', 'coordinates': [-86.7013, 36...",True
4,20240100938,2024-02-13 16:30:00,2.0,1.0,0.0,False,ANGLE,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,CLARKSVILLE PKE & BUENA VISTA PKE,NASHVILLE,TN,37218,3135,NORTH,36.1935,-86.8309,"{'type': 'Point', 'coordinates': [-86.8309, 36...",True


The resulting table is exported to a .csv file for use in the EDA notebook.

In [10]:
crashes_raw.to_csv('../data/clean/crashes.csv')

In [11]:
# Create the map
all_crashes_heatmap = folium.Map(location = map_center, tiles="Cartodb Positron", zoom_start = 12)
folium.GeoJson(zipcodes, style_function=lambda feature: {"color": "black", "weight": 2, "dashArray": "10, 5","fillOpacity":0.125}).add_to(all_crashes_heatmap)
crashes = []
for row_index, row_values in crashes_geo.iterrows():
    loc = [row_values['lat'], row_values['long']]
    crashes.append(loc)
crashes

HeatMap(crashes, radius = 15, min_opacity = 0.5, gradient={.5: '#ffc2c2', .75: '#ff7970', 1: '#ff0000'}).add_to(folium.FeatureGroup(name='Heat Map').add_to(all_crashes_heatmap))
folium.LayerControl().add_to(all_crashes_heatmap)

all_crashes_heatmap.save('../maps/all_crashes_heatmap.html')

NameError: name 'folium' is not defined