The traffic accident data is the key to this analysis. To make sure it's current, instead of downloading a static .csv file it, along with several other datasets from the City of Nashville's OpenData portal, will be periodically downloaded and updated via APIs.

To avoid too many pull requests, all of the data will be pulled here and the relevant information will be exported for use in a separate notebook for EDA.

** This notebook will be revisited and edited as data needs may change for this analysis.

In [1]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Needed to pip install sodapy first
from sodapy import Socrata 

**Data Source:** https://data.nashville.gov/Police/Traffic-Accidents/6v6w-hpcw/about_data

In [2]:
client = Socrata("data.nashville.gov", None)



In [3]:
crashes_export = client.get("6v6w-hpcw", limit=200000)
crashes_raw = pd.DataFrame.from_records(crashes_export)
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,reporting_officer,collision_type,collision_type_description,weather,...,rpa,precinct,lat,long,mapped_location,:@computed_region_wvby_4s8j,:@computed_region_3aw5_2wv7,:@computed_region_p6sk_2acq,:@computed_region_gxvr_9jxz,property_damage
0,20240095763,2024-02-11T00:00:00.000,1,0,0,False,4004447,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,21,...,1751,MADISO,36.3055,-86.6936,"{'type': 'Point', 'coordinates': [-86.6936, 36...",1,2,17,41,
1,20240095541,2024-02-11T16:59:00.000,2,0,0,False,4011437,4,ANGLE,21,...,8517,MIDTOW,36.1064,-86.7429,"{'type': 'Point', 'coordinates': [-86.7429, 36...",1,1,9,3,
2,20240095464,2024-02-11T14:28:00.000,2,0,0,False,4003195,4,ANGLE,21,...,9557,HERMIT,36.1742,-86.6019,"{'type': 'Point', 'coordinates': [-86.6019, 36...",1,2,26,44,
3,20240095355,2024-02-11T13:50:00.000,2,0,0,True,4007613,5,SIDESWIPE - SAME DIRECTION,22,...,87060,SOUTH,36.0629,-86.6829,"{'type': 'Point', 'coordinates': [-86.6829, 36...",1,1,28,15,
4,20240095275,2024-02-11T00:00:00.000,2,0,0,False,226827,5,SIDESWIPE - SAME DIRECTION,22,...,4696,WEST,36.153,-86.8576,"{'type': 'Point', 'coordinates': [-86.8576, 36...",1,1,24,46,


In [4]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179999 entries, 0 to 179998
Data columns (total 29 columns):
 #   Column                       Non-Null Count   Dtype 
---  ------                       --------------   ----- 
 0   accident_number              179999 non-null  object
 1   date_and_time                179999 non-null  object
 2   number_of_motor_vehicles     179998 non-null  object
 3   number_of_injuries           179999 non-null  object
 4   number_of_fatalities         179999 non-null  object
 5   hit_and_run                  179985 non-null  object
 6   reporting_officer            179988 non-null  object
 7   collision_type               179989 non-null  object
 8   collision_type_description   179989 non-null  object
 9   weather                      173954 non-null  object
 10  weather_description          173954 non-null  object
 11  illuaccidemination           179721 non-null  object
 12  illumination_description     179721 non-null  object
 13  harmfulcodes  

Now that the DataFrame has been created, some columns can be removed as they're not relevant to this analysis.

In [5]:
crashes_raw = crashes_raw.drop(['reporting_officer', 'collision_type', 'illuaccidemination', 'harmfulcodes', ':@computed_region_wvby_4s8j', ':@computed_region_3aw5_2wv7', ':@computed_region_p6sk_2acq', ':@computed_region_gxvr_9jxz', 'weather'], axis=1)

In [6]:
crashes_raw['date_and_time'] = pd.to_datetime(crashes_raw['date_and_time'])
crashes_raw = crashes_raw.astype({'number_of_motor_vehicles': 'float',
                    'number_of_injuries': 'float',
                    'number_of_fatalities': 'float',
                    'hit_and_run': 'bool',
                    'property_damage': 'bool'})

In [7]:
crashes_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179999 entries, 0 to 179998
Data columns (total 20 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   accident_number             179999 non-null  object        
 1   date_and_time               179999 non-null  datetime64[ns]
 2   number_of_motor_vehicles    179998 non-null  float64       
 3   number_of_injuries          179999 non-null  float64       
 4   number_of_fatalities        179999 non-null  float64       
 5   hit_and_run                 179999 non-null  bool          
 6   collision_type_description  179989 non-null  object        
 7   weather_description         173954 non-null  object        
 8   illumination_description    179721 non-null  object        
 9   harmfuldescriptions         177933 non-null  object        
 10  street_address              179994 non-null  object        
 11  city                        179999 non-

For some initial cleaning, all text fields should be converted to the same case, in this case UPPER

In [8]:
crashes_raw['collision_type_description'] = crashes_raw['collision_type_description'].astype(str).str.upper()
crashes_raw['illumination_description'] = crashes_raw['illumination_description'].astype(str).str.upper()
crashes_raw['harmfuldescriptions'] = crashes_raw['harmfuldescriptions'].astype(str).str.upper()
crashes_raw['street_address'] = crashes_raw['street_address'].astype(str).str.upper()
crashes_raw['city'] = crashes_raw['city'].astype(str).str.upper()
crashes_raw['state'] = crashes_raw['state'].astype(str).str.upper()
crashes_raw['precinct'] = crashes_raw['precinct'].astype(str).str.upper()
crashes_raw['weather_description'] = crashes_raw['weather_description'].astype(str).str.upper()
crashes_raw['property_damage'] = crashes_raw['property_damage'].astype(str).str.upper()

In [9]:
crashes_raw.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,hit_and_run,collision_type_description,weather_description,illumination_description,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location,property_damage
0,20240095763,2024-02-11 00:00:00,1.0,0.0,0.0,False,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,CLEAR,DARK - NOT LIGHTED,CURB,CONFERENCE DR & RIVERGATE MALL PVTDR,GOODLETTSVILLE,TN,37072,1751,MADISO,36.3055,-86.6936,"{'type': 'Point', 'coordinates': [-86.6936, 36...",True
1,20240095541,2024-02-11 16:59:00,2.0,0.0,0.0,False,ANGLE,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,MORTON AV & NOLENSVILLE PKE,NASHVILLE,TN,37211,8517,MIDTOW,36.1064,-86.7429,"{'type': 'Point', 'coordinates': [-86.7429, 36...",True
2,20240095464,2024-02-11 14:28:00,2.0,0.0,0.0,False,ANGLE,CLEAR,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,OLD HICKORY BLVD & BURNING TREE DR,HERMITAGE,TN,37076,9557,HERMIT,36.1742,-86.6019,"{'type': 'Point', 'coordinates': [-86.6019, 36...",True
3,20240095355,2024-02-11 13:50:00,2.0,0.0,0.0,True,SIDESWIPE - SAME DIRECTION,CLOUDY,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,MM 57 6 I 24,ANTIOCH,TN,37013,87060,SOUTH,36.0629,-86.6829,"{'type': 'Point', 'coordinates': [-86.6829, 36...",True
4,20240095275,2024-02-11 00:00:00,2.0,0.0,0.0,False,SIDESWIPE - SAME DIRECTION,CLOUDY,DAYLIGHT,MOTOR VEHICLE IN TRANSPORT,BRILEY PKWY & I40 W EXT RAMP,NASHVILLE,TN,37209,4696,WEST,36.153,-86.8576,"{'type': 'Point', 'coordinates': [-86.8576, 36...",True


The resulting table is exported to a .csv file for use in the EDA notebook.

In [10]:
crashes_raw.to_csv('../data/clean/crashes.csv')