# U.S. Traffic Casualty Analysis - Data Wrangling & EDA

Traffic fatalities are a significant issue in the United States. The goal of this project proposal is to use data science to identify factors contributing to traffic accidents and develop targeted interventions to reduce the number of fatalities using resources from 2016-2021.

According to the National Highway Traffic Safety Administration (NHTSA), the number of traffic fatalities in the United States from 2016 to 2021 is as follows:
* 2016: 37,806 
* 2017: 37,133 
* 2018: 36,835 
* 2019: 36,096 
* 2020: 38,680 
* 2021: 42,915 

It's worth noting that 2020 saw an increase in traffic fatalities, despite a decrease in the number of vehicles on the road due to the COVID-19 pandemic. The reasons for this increase are complex and multifactorial, but some contributing factors include an increase in risky behaviors such as speeding and distracted driving, as well as an increase in alcohol and drug use.

## 1.1 Imports

In [2]:
#Import pandas, matplotlib.pyplot, and seaborn in the correct lines below
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

**Data** \
accidents: this is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2021, using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 2.8 million accident records in this dataset. Check here to learn more about this dataset.

weather: this is a countrywide weather events dataset that includes 7.5 million events, and covers 49 states of the United States. Examples of weather events are rain, snow, storm, and freezing condition. Some of the events in this dataset are extreme events (e.g. storm) and some could be regarded as regular events (e.g. rain and snow). The data is collected from January 2016 to December 2021, using historical weather reports that were collected from 2,071 airport-based weather stations across the nation.

constructions: this is a countrywide dataset of road construction and closure events, which covers 49 states of the US. Construction events in this dataset could be any roadwork, ranging from fixing pavements to substantial projects that could take months to finish. The data is collected from Jan 2016 to Dec 2021

In [None]:
accidents = pd.read_csv('accidents.csv')
weather = pd.read_csv('weather.csv')
constructions = pd.read_csv('constructions.csv')