# US Traffic Accident Dataset Exploration
Grant Hanley

Working Analysis 

Last Updated: JAN 2 2022


## Motivation

Car accidents are a major public health concern, causing significant numbers of deaths and injuries every year. According to the Centers for Disease Control and Prevention (CDC), motor vehicle accidents are the leading cause of death for people aged 1-54 in the United States. In 2019, over 36,000 people died in car accidents in the US, and millions more were injured. These accidents not only have devastating personal and emotional impacts, but they also have significant economic costs. The National Highway Traffic Safety Administration estimates that the economic cost of motor vehicle crashes in the US was over $800 billion in 2020, including costs such as medical expenses, lost productivity, and property damage. Reducing the number of car accidents, deaths, and injuries is therefore a crucial goal, with the potential to save lives and reduce economic burden.

## Application

Geo-spatial analytics and data science techniques can play a vital role in identifying where reoccurring car accidents are happening. Doing so may help to identify local interventions which may support reducing the number of accidents. By analyzing accident data with geographic information, it is possible to identify patterns and trends in the location and frequency of accidents. For example, geo-spatial data analysis can be used to identify hotspots, or areas with a high concentration of accidents, which may be caused by factors such as poor road design, high traffic volumes, or high speeds.

Data science techniques, such as machine learning and predictive modeling, can also be used to identify factors that contribute to the likelihood of an accident occurring at a particular location. For example, data on weather conditions, road conditions, traffic volumes, and other factors can be used to build models that predict the likelihood of an accident occurring at a particular location. This information can be used to inform the development of interventions aimed at reducing the number of accidents in high-risk areas.

By identifying where reoccurring accidents are occurring and understanding the factors that contribute to these accidents, geo-spatial analytics and data science techniques can help policymakers and other stakeholders develop targeted interventions to reduce the number of car accidents, deaths, and injuries. These interventions may include measures such as improved road design, traffic calming measures, or educational campaigns to promote safe driving practices.

## The Data 
This dataset was found on Kaggle here: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents?resource=download. The authors request the following citations: 

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019. 

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

In their words, "This is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2021, using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 2.8 million accident records in this dataset."

In [134]:
#Requirements
import pandas as pd
import numpy as np
import plotly.express as px
import scipy
import geopandas as gpd
import statistics
from shapely.geometry import Point

## Load Data

In [135]:
#Load Large Dataset ~1.1 GB using chunksize
#Dataset found here: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents/download?datasetVersionNumber=12
reader = pd.read_csv('US_Accidents_Dec21_updated.csv', iterator=True, chunksize=10000)

#use pd.concat to get all 2.8mil rows, takes ~2 mins...apparently dask can improve efficiency
df = pd.concat(reader, ignore_index=True)

The Car Accident dataset is over 1GB and accordingly requires the use of an iterator and being broken into smaller chunks to be read into a pandas dataframe. After reading in the TextFileReader object, the reader can be concatenated into a dataframe.

In [None]:
#Display the maximum number of columns, show the first few data points
pd.set_option('display.max_columns', None)
df.head(3)


Unnamed: 0,ID,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,Distance(mi),Description,Number,Street,Side,City,County,State,Zipcode,Country,Timezone,Airport_Code,Weather_Timestamp,Temperature(F),Wind_Chill(F),Humidity(%),Pressure(in),Visibility(mi),Wind_Direction,Wind_Speed(mph),Precipitation(in),Weather_Condition,Amenity,Bump,Crossing,Give_Way,Junction,No_Exit,Railway,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight
0,A-1,3,2016-02-08 00:37:08,2016-02-08 06:37:08,40.10891,-83.09286,40.11206,-83.03187,3.23,Between Sawmill Rd/Exit 20 and OH-315/Olentang...,,Outerbelt E,R,Dublin,Franklin,OH,43017,US,US/Eastern,KOSU,2016-02-08 00:53:00,42.1,36.1,58.0,29.76,10.0,SW,10.4,0.0,Light Rain,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Night
1,A-2,2,2016-02-08 05:56:20,2016-02-08 11:56:20,39.86542,-84.0628,39.86501,-84.04873,0.747,At OH-4/OH-235/Exit 41 - Accident.,,I-70 E,R,Dayton,Montgomery,OH,45424,US,US/Eastern,KFFO,2016-02-08 05:58:00,36.9,,91.0,29.68,10.0,Calm,,0.02,Light Rain,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Night
2,A-3,2,2016-02-08 06:15:39,2016-02-08 12:15:39,39.10266,-84.52468,39.10209,-84.52396,0.055,At I-71/US-50/Exit 1 - Accident.,,I-75 S,R,Cincinnati,Hamilton,OH,45203,US,US/Eastern,KLUK,2016-02-08 05:53:00,36.0,,97.0,29.7,10.0,Calm,,0.02,Overcast,False,False,False,False,True,False,False,False,False,False,False,False,False,Night,Night,Night,Day


In [None]:
#check lengths and data types
print(len(df))
print(df.dtypes)

2845342
ID                        object
Severity                   int64
Start_Time                object
End_Time                  object
Start_Lat                float64
Start_Lng                float64
End_Lat                  float64
End_Lng                  float64
Distance(mi)             float64
Description               object
Number                   float64
Street                    object
Side                      object
City                      object
County                    object
State                     object
Zipcode                   object
Country                   object
Timezone                  object
Airport_Code              object
Weather_Timestamp         object
Temperature(F)           float64
Wind_Chill(F)            float64
Humidity(%)              float64
Pressure(in)             float64
Visibility(mi)           float64
Wind_Direction            object
Wind_Speed(mph)          float64
Precipitation(in)        float64
Weather_Condition         object
Am

The current car accident dataset consists of over 2.8 million car accidents. The information contained includes geocoded spatial-temporal data with a number of other variables representing weather attributes at the time of the accident and characterization of the traffic road features in the area where the accidents occured.

## Data Wrangling

### Filtering

In [None]:
# Filter data to state of virginia only
df = df[(df['State'] == 'VA')]

#narrow to a county, use geopandas to load a county polygon, filter to accidents within the county polygon
#will require a later merge with county shapefile
#df = df[(df['State'] == 'VA') & df['County'] == 'Prince William']

#consider zipcode filtering
#merge with ZCTA from Census.gov

#extract I-95 lines from open street map, create a buffer polygon around the line, filter to accidents within the buffer

In [None]:
#Round off the 'Start_Lat' and 'Start_Lng' variables to allow for area buffer that an accident began
df['lat'] = df['Start_Lat'].round(4)
df['lon'] = df['Start_Lng'].round(4)

# Change data type of severity to an integer to support filtering 
df["Severity"] = df["Severity"].astype(int)

#groupby but show the top 10 locations instead of the top 10 data points
grouped_count = df.groupby(['lon', 'lat']).size().rename('count').reset_index()
grouped_count.sort_values(by='count', ascending=False).head(10)


Unnamed: 0,lon,lat,count
20061,-77.4788,37.5525,218
42186,-76.3412,36.7587,121
19777,-77.4838,37.5543,106
22928,-77.4439,37.553,105
20127,-77.4776,37.5521,104
19993,-77.4801,37.553,103
42784,-76.2946,36.9667,99
43256,-76.2717,36.9628,92
35051,-77.1709,38.7953,88
40852,-76.4537,37.0156,87


### Merge 

In [None]:
# Create new variable 'lat_lon' with combined lat longs to support merging
grouped_count['lat_lon']= list(zip(grouped_count['lat'], grouped_count['lon']))
df['lat_lon']= list(zip(df['lat'], df['lon']))

# Merge the original df and grouped_count df to maintain count information as a feature for all data points
# Use a left join to retain all information in the Car Accident Dataset
df = pd.merge(df, grouped_count, on ='lat_lon', how ='left')

# Drop added variables created from the join, rename variables changed from the merge
df = df.drop(['lon_y','lat_y'], axis=1).rename(columns={'lat_x':'lat', 'lon_x':'lon'})

# Display sample .head()
df.head(3)


Unnamed: 0,ID,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,Distance(mi),Description,Number,Street,Side,City,County,State,Zipcode,Country,Timezone,Airport_Code,Weather_Timestamp,Temperature(F),Wind_Chill(F),Humidity(%),Pressure(in),Visibility(mi),Wind_Direction,Wind_Speed(mph),Precipitation(in),Weather_Condition,Amenity,Bump,Crossing,Give_Way,Junction,No_Exit,Railway,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight,lat,lon,lat_lon,count
0,A-31732,3,2016-12-01 05:59:45,2016-12-01 11:59:45,38.52464,-77.36779,38.52242,-77.37061,0.216,At Russell Rd/Exit 148 - Accident.,,I-95 S,R,Quantico,Prince William,VA,22134,US,US/Eastern,KNYG,2016-12-01 05:56:00,53.1,,57.0,29.73,10.0,WNW,4.6,,Clear,False,False,False,False,True,False,False,False,False,False,False,False,False,Night,Night,Night,Day,38.5246,-77.3678,"(38.5246, -77.3678)",13
1,A-31733,3,2016-12-01 05:58:25,2016-12-01 11:58:25,37.38017,-77.4116,37.3489,-77.40544,2.187,Between VA-288/Exit 62 and VA-10/Exit 61 - Acc...,,Richmond Petersburg Tpke S,R,Richmond,Chesterfield,VA,23237,US,US/Eastern,KFCI,2016-12-01 05:55:00,55.4,,44.0,29.74,10.0,West,5.8,,Clear,False,False,False,False,True,False,False,False,False,False,False,False,False,Night,Night,Night,Day,37.3802,-77.4116,"(37.3802, -77.4116)",9
2,A-31734,4,2016-12-01 05:54:28,2016-12-01 11:54:28,38.81198,-77.173274,38.81194,-77.176097,0.152,Closed at Braddock Rd - Road closed due to acc...,6616.0,Braddock Rd,L,Annandale,Fairfax,VA,22003-6103,US,US/Eastern,KDAA,2016-12-01 05:58:00,47.8,,76.0,29.71,10.0,WNW,6.9,,Clear,False,False,True,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Day,38.812,-77.1733,"(38.812, -77.1733)",2


### Date Time Format

In [None]:
# Time to pandas date time format
df['Start_Time'] = df['Start_Time'].apply(pd.to_datetime)
df['End_Time'] = df['End_Time'].apply(pd.to_datetime)

### GeoDataFrame

In [None]:

# creating a geometry column 
point_geometry = [Point(xy) for xy in zip(df['Start_Lng'], df['Start_Lat'])]

# Coordinate reference system : WGS84
crsys = {'init': 'epsg:4326'}

# Creating a Geographic data frame with date time included, spatial temporal
df = gpd.GeoDataFrame(df, crs=crsys, geometry=point_geometry)


'+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6



In [None]:
#find the average accident rate across all space, find the ~top 5% of accident locations
#consider removing, consider the raster approach in lieu of count approach, better for later ML applications
avg_accident_rate = grouped_count.sort_values(by='count', ascending=False)['count'].mean()
std_accident_rate = grouped_count.sort_values(by='count', ascending=False)['count'].mean()
twosigma = avg_accident_rate+2*std_accident_rate

#groupby lat and long, create count variable
hotspots = df.groupby(['lon', 'lat']).size().rename('count').reset_index().sort_values(by='count', ascending=False)
hotspots = hotspots[(hotspots['count']>= twosigma)]

#plotly figure with hotspots identified
#...

In [None]:

# Change data type of severity to an integer to support filtering 
df["Severity"] = df["Severity"].astype(int)

# Focus in on a major slow down area in the I95 cooridoor
focus = df[(df['Start_Lat'] >= 38.66) & (df['Start_Lat'] <= 38.71) & (df['Start_Lng'] >= -77.26) & (df['Start_Lng'] <= -77.21)]

focus['Start_Time'] = focus['Start_Time'].apply(pd.to_datetime)
focus['End_Time'] = focus['End_Time'].apply(pd.to_datetime)

#create point map or scatter mapbox from plotly and customize layout options
pointmap = px.scatter_mapbox(
    focus,
    lat="Start_Lat",
    lon="Start_Lng",
    color = "Severity",
    hover_name= "Severity",
    hover_data=["lon", "lat", "count"],
    zoom=12,
    height=400,
    color_discrete_sequence=px.colors.qualitative.Prism
)
pointmap.update_layout(mapbox_style="open-street-map")
pointmap.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})

#display the point map
pointmap.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
from scipy import ndimage
import matplotlib.pyplot as plt
import geopandas as gpd
import numpy as np

def heatmap(d, bins=(100,100), smoothing=1.3, cmap='jet'):
    def getx(pt):
        return pt.coords[0][0]

    def gety(pt):
        return pt.coords[0][1]

    x = list(d.geometry.apply(getx))
    y = list(d.geometry.apply(gety))
    heatmap, xedges, yedges = np.histogram2d(y, x, bins=bins)
    extent = [yedges[0], yedges[-1], xedges[-1], xedges[0]]

    logheatmap = np.log(heatmap)
    logheatmap[np.isneginf(logheatmap)] = 0
    logheatmap = ndimage.filters.gaussian_filter(logheatmap, smoothing, mode='nearest')
    
    plt.imshow(logheatmap, cmap=cmap, extent=extent)
    plt.colorbar()
    plt.gca().invert_yaxis()
    plt.show()

In [133]:
heatmap(focus, bins=50, smoothing=1.5)

TypeError: 'Figure' object is not callable

## Geospatial Point Map

Point maps are also useful for visualizing data because they allow you to see the data in relation to the geography of the area being analyzed. This can be particularly useful for understanding the context of the data and for identifying spatial patterns and trends.

Here a point map is used with the data separated categorically by car accident severity as an initial attempt to gain understand where the car accidents are reoccuring. 

- Severe Virginia car accidents using pandas and plotly
- build in time slider

In [None]:
#color by severity, required to be a string to be categroical
df["Severity"] = df["Severity"].astype(str)

#create point map or scatter mapbox from plotly and customize layout options
pointmap = px.scatter_mapbox(
    df,
    lat="Start_Lat",
    lon="Start_Lng",
    color="Severity",
    hover_name="ID",
    hover_data=["Start_Time","Severity","City","State", "Zipcode", "Street"],
    zoom=6,
    height=400,
    color_discrete_sequence=px.colors.qualitative.Prism
)
pointmap.update_layout(mapbox_style="open-street-map")
pointmap.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})

#display the point map
pointmap.show()

Geo-Spatial  Heatmap

A geospatial heatmap is a type of heatmap that can be used to visualize data with a geographic component. A geospatial heatmap typically uses a color scale to represent the values of the data. Geospatial heatmaps are particularly useful for identifying spatial patterns and trends in data, as they allow you to see how the data varies across different locations. The hotspots in the spatial heatmap here represent locations where multiple accidents are reoccuring within close proximity.  The hotspots are useful for identifying areas of the data that may require further investigation or analysis.

- Heatmap of Virginia car accidents using pandas, statistics, and plotly

In [131]:
#find the center lat and long for the data in our set to center the map using the statistics package
meanLong = statistics.mean(df['Start_Lng'])
meanLat = statistics.mean(df['Start_Lat'])

#create a heatmap of the data
heatmap = px.density_mapbox(df, lat='lat', lon='lon', z='count', radius=5,
                        center=dict(lat=meanLat, lon=meanLong), zoom=6,
                        mapbox_style="open-street-map")
heatmap.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
heatmap.show()

**Explore GeoPandas, Shapely and other Geospatial packages**
- Include geospatial polygons on a plotly map, county or zipcode narrow
- Do some demons on fundamental use of these packages. 

- Shapely: https://shapely.readthedocs.io/en/stable/: This package provides tools for working with geometric objects such as points, lines, and polygons. It is often used in conjunction with Geopandas to perform spatial operations on geospatial data.
- GeoPandas: https://geopandas.org/en/stable/: This package allows you to manipulate and analyze geospatial data stored in the form of dataframes, similar to the way you would work with regular pandas dataframes. It is built on top of the popular pandas package and allows you to easily perform operations such as merging, intersecting, and dissolving geospatial data.

- Rasterio: https://rasterio.readthedocs.io/en/stable/: This package allows you to read, write, and manipulate raster data such as satellite imagery or terrain data. It is designed to work with the popular numpy package and can be used to perform operations such as image processing and analysis.

- Folium: https://python-visualization.github.io/folium/: This package allows you to create interactive maps using the Leaflet.js library and display them in a Jupyter notebook or web page. It is particularly useful for visualizing geospatial data and can be used to create choropleth maps, heatmaps, and marker maps.

- GDAL/OGR: https://gdal.org/: This package provides a set of tools for reading, writing, and manipulating geospatial data in a variety of formats. It is a powerful library that is widely used in the GIS community and is often used in conjunction with other packages such as Rasterio and Geopandas.

- SciPy: https://docs.scipy.org/doc/scipy/reference/index.html: SciPy is a collection of open source scientific and technical computing tools for Python. It is built on top of the NumPy library and includes a variety of submodules for tasks such as optimization, linear algebra, signal processing, and more.

**OpenStreetMap Feature Extraction**
- use osm data to extract features at the specific location

**What OSM features are associated with high density accident locations?**
- extract road/street features features
    - add buff to street lines, filter by 
- extract buildings to create a raster 
    - supports object detection refinement
- create a raster from osm extracted features
- create a raster from the accident data
    - representative of accident likelihood

**Machine Learning Model**
- Build a probability of car accident model from nearby osm features
    - focus in on the top 10-20 most significant locations in northern va
    - extract nearby features
    - rasterize features 
    - build rasters into a model

**Computer Vision Integration**
- Imagery acquisition
- bring in raster imagery of those locations
- plot specific accidents on satellite imagery
    - consider a mapbox api account
    - consider transitioning to folium for this: https://vexceldata.com/lets-get-technical-using-web-map-tiles-in-python-pt-1/ 
- use opencv and conduct object detection on imagery the locations

In [None]:
#gather zcta geometries from census.gov
import geopandas as gpd

# Read in a shapefile of US Zip code tabulation areas 
gdf_zips = gpd.read_file("https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip")

# Print the first few rows of the data
gdf_zips.head()
len(gdf_zips)

# Need to use ZCTA to zipcode crosswalk to conduct geometry merge: https://udsmapper.org/zip-code-to-zcta-crosswalk/

# Merge with another dataframe with reference data...

# Display polygons...

# Apply to Virginia...Zips

# Create a filter from the shape files, area filter


Create a raster file from point data
-interpolate point data
Extract raster values at a specific location

In [None]:
from scipy.interpolate import griddata
import matplotlib.pyplot as plt

#define interpolation inputs
points = list(zip(df.Start_Lat,df.Start_Lng))
df['value'] = np.ones((len(df),1))
values = df.value

#define raster resolution
rRes = .1

#create coord ranges over the desired raster extension
yRange = np.arange(df.Start_Lat.min(),df.Start_Lat.max()+rRes,rRes)
xRange = np.arange(df.Start_Lng.min(),df.Start_Lng.max()+rRes,rRes)

#create arrays of x,y over the raster extension
gridX,gridY = np.meshgrid(xRange, yRange)

#interpolate over the grid
grid_z = griddata(points, values, (gridX,gridY), method='cubic')

print(grid_z.shape)
print((df.Start_Lat.min(),df.Start_Lat.max()))
print((df.Start_Lng.min(),df.Start_Lng.max()))

plt.imshow(grid_z)


In [None]:
import folium
from folium.raster_layers import ImageOverlay

# Load the raster data
data = ...  # Load the raster data from a file or source

# Define the bounds of the raster data
bounds = ...  # Define the bounds of the raster data in latitude and longitude coordinates

# Create a folium Map object
m = folium.Map(location=[latitude, longitude], zoom_start=12)

# Add the raster data as an image overlay to the map
ImageOverlay(data, bounds, opacity=0.5).add_to(m)

# Display the map
m

#do it with plotly
# Create an imshow plot with plotly
fig = px.imshow(data, x=bounds[0], y=bounds[1], color_continuous_scale="Viridis")

# Set the map projection
fig.update_layout(mapbox_style="open-street-map", mapbox_center_lat=latitude, mapbox_center_lon=longitude, mapbox_zoom=12)

# Display the plot
fig.show()

Create a heatmap with folium

In [None]:
import folium
from folium.plugins import HeatMap

# create base map object using Map()
mapObj = folium.Map(location=[meanLat, meanLong], zoom_start = 7)

# create heatmap layer
heatmap = HeatMap( list(zip(df['lat'], df['lon'], df["count"])),
                   min_opacity=0.5,
                   radius=25, blur=25, 
                   max_zoom=1)
# add heatmap layer to base map
heatmap.add_to(mapObj)
mapObj

Use matplotlib, geopandas and shapely point geometry to create a point map

In [None]:
#extract state polygon data from census.gov with geopandas.read_file
state_df = gpd.read_file("https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_5m.zip")
state_df.head()


# References

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

"Motor vehicle accidents are the leading cause of death for people aged 1-54 in the United States." - Centers for Disease Control and Prevention. (n.d.). Leading causes of death. Retrieved December 25, 2022, from https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm

"In 2019, over 36,000 people died in car accidents in the US, and millions more were injured." - National Highway Traffic Safety Administration. (2020). Traffic Safety Facts: 2019 Motor Vehicle Crashes: Overview. Retrieved December 25, 2022, from https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812497

"The National Highway Traffic Safety Administration estimates that the economic cost of motor vehicle crashes in the US was over $800 billion in 2020, including costs such as medical expenses, lost productivity, and property damage." - National Highway Traffic Safety Administration. (2020). The Economic and Societal Impact of Motor Vehicle Crashes, 2010 (Revised). Retrieved December 25, 2022, from https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/812013-economic_societal_impact_2010.pdf


https://smoosavi.org/datasets/us_accidents

https://arxiv.org/pdf/1906.05409.pdf 

https://paperswithcode.com/paper/a-countrywide-traffic-accident-dataset/review/ 

https://arxiv.org/abs/1909.09638 

https://gking.harvard.edu/files/0s.pdf

https://pdf.sciencedirectassets.com/308315/1-s2.0-S2352146516X00051/1-s2.0-S235214651630299X/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH8aCXVzLWVhc3QtMSJGMEQCIGvjeArr308aWai7ZIlvy2SouQbVcuEHDlhZ0vipSRAvAiA3dXbOIrBb3bN%2FHCyPirhEi%2FUyg6EopOyhCTZDgc14vyrVBAio%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAUaDDA1OTAwMzU0Njg2NSIM%2F5PyE1uoi44S3jqwKqkENl3HXqL74nPe2Z2PtLnflCl8UbcijNdn7HUyBcY%2BeLog7qMIbrqaiMNjsinlUWSD9nLH17sXgx%2F6GJFmZ37GnBL%2FMvdvQJ4dZAgNSMsL3ppoCi%2F%2BCagUh7tqIvFH7%2BKW0TI1W6cMxjm5%2Bv7ov7JxvVOClRu7U2erwvpZ9FIBvK5ILlbwCDNSxl2m3nNZrA8VIIX6RX1zXJsejOiZIleB6wITF7Ci9LS5M9C1AiDeI%2Fkkt8XW0bVEiWxsFZIoZVM6Ji6DUJinKjKhQJW93lHtDGon2sC5nS%2BcXXwNMdRZI78D97LV4wJw9qabK6lnjAiNKuvkXTV2Lja6h1BaOUKmqmWv3GuW9MBo4cK%2FfZsYtmkqxhhhV14Pa87IkvdJFcQmQZ6Qs7btMuqEUxAl8gB%2FqI0cWhowoL35uOT%2FZ70PJmpWHRpDoiOgRpVOt9%2Ft8LpJV%2FXZcde7bH26vlMgepxy9oodbaV6ccQjQjBp2ye0YZSKdFuwI3cQbY2ptpDHWhrUDDen9g7n4lM0yI%2Fm4R5MIiBPseUoxqITdYwpX%2Bg6lw%2B%2BnDs6PCbUGUS7HWuPbZidM3qpGlXAOUh%2BDylEcaLofsR3BIr4aMQ8jEVSrAxcfX1%2FnTJFzuqLQvb8FbnA4MQoHqrz5kGaNSYoGExzcfxbTR8XzknvehdktEziCc5wpkwtWDRs5gOJVMW7ExIiErkcVsvxgaSVO00lTAJyr5p5oZINHCIk2mIPxzCE89ycBjqqAfvIiqMv4qvi3ZuQkfq2V8rba0HtH0NBld9zevqeNGZtN6X%2FiF6XE4DV5xp%2FR2YonK99XPeqTkPerip2RzB6%2BQ%2FRi%2FXbNzM0wvadjp3inlXXxK6yttR4krt0RbDiFqrMrzpEUXEpjcvBFNvbhOsL6YvXlVEHufeuNEyYtc%2F3EhRQDCm%2B8nn06I%2FJdh2Kt5Q3Sq9miKHi7XD8I52oX%2B8KUvS3lJYVaxfwEUu0&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20221212T153730Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYUOC55GC7%2F20221212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=4975f068f7a383d3db89d29b4e55256270c37b6f3cea65612bd414d7d62745e0&hash=b5644175f821ef219444baa69ce7dd1e78143d84d14d753af7fb935fa38509f5&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S235214651630299X&tid=spdf-f033c6e3-936b-4a76-9447-4c4be9474c87&sid=47293284386e994eef08f2a94a40746c1264gxrqa&type=client&ua=55505a5456010f0a02&rr=77878aadb95e82c9

https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents 

https://www.kaggle.com/code/satyabrataroy/60-insights-extraction-us-accident-analysis/notebook

https://www2a.cdc.gov/nioshtic-2/BuildQyr.asp?s1=20056884&f1=%2A&Startyear=&Adv=0&terms=1&D1=10&EndYear=&Limit=10000&sort=&PageNo=1&RecNo=1&View=f&

http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking 

https://pygis.io/docs/c_rasters.html