# US Accidents
US Accidents is a countrywide traffic accident dataset from Kaggle.com. This dataset contains 4.2 million accident records for the contiguous United States collected between Feb 2016 to Dec 2020.

This dataset can be used to answer the following questions:

* Which state had the highest number of traffic accidents?
* What time do these traffic accidents usually occur in the United States?
* Which weather conditions were present when traffic accidents occurred?
* What factors affect the severity of a traffic accident (e.g. time, weather, location)?
* ... and so on!

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed.
# You can write up to 20GB to the current directory that gets preserved as output when you create a version using "Save & Run All."

import numpy as np 
import pandas as pd
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns

# Read a comma-separated values (csv) file into DataFrame.

df = pd.read_csv('/kaggle/input/us-accidents/US_Accidents_Dec20_Updated.csv')

## US Accidents by State

California was the US state with the most reported traffic accidents.

In [None]:
fig = go.Figure(
    data=go.Choropleth(
        locations = pd.value_counts(df['State']).index, 
        z = pd.value_counts(df['State']).values.astype(float), 
        locationmode = 'USA-states', 
        colorscale = 'SunsetDark', 
        colorbar_title = "Count"), 
    layout=go.Layout(
        title_text='US Accidents by State (Feb 2016—Dec 2020)', 
        title_x=0.5, 
        font=dict(family='Verdana', size=12, color='#000000'), 
        geo_scope='usa'))

fig.show()

## US Accidents by Local Time

Most traffic accidents in the United States occured between 8:00—8:59 am.

In [None]:
Local_Time = pd.to_datetime(df.Start_Time).dt.hour

In [None]:
plt.figure(figsize=(10,6), dpi=100.0),
plt.title('US Accidents by Local Time (Feb 2016—Dec 2020)', family='Verdana', fontsize=16)
plt.xlabel('Local Time', family='Verdana', fontsize=12)
plt.ylabel('Count', family='Verdana', fontsize=12)
plt.grid(linestyle=':', linewidth='0.25', color='salmon')
ax = sns.barplot(x=pd.value_counts(Local_Time).index, y=pd.value_counts(Local_Time).values.astype(float), palette='hls')
ax.set_xticklabels(['12:00-01:00', '01:00-02:00', '02:00-03:00', '03:00-04:00', '04:00-05:00', '05:00-06:00', '06:00-07:00', '07:00-08:00', '08:00-09:00', '09:00-10:00', '11:00-12:00', '12:00-13:00', '13:00-14:00', '14:00-15:00', '15:00-16:00', '16:00-17:00', '17:00-18:00', '18:00-19:00', '19:00-20:00', '20:00-21:00', '21:00-22:00', '22:00-23:00', '23:00-24:00', '24:00-00:00'], size=6, rotation=-45)
plt.show()

## US Accidents by Weather Condition

Weather Conditions—Fog, Rain, Snow, and Thunderstorms—were reported when accidents occurred.

In [None]:
Weather_Condition = df.Weather_Condition.value_counts().sort_values(ascending=False).head(10)

In [None]:
plt.figure(figsize=(10,6), dpi=100.0)
plt.title('US Accidents by Top 10 Weather Conditions (Feb 2016—Dec 2020)', family='Verdana', fontsize=16)
plt.xlabel('Count', family='Verdana', fontsize=12)
plt.ylabel('Weather Condition', family='Verdana', fontsize=12)
plt.grid(linestyle=':', linewidth='0.25', color='salmon')
sns.barplot(x=Weather_Condition.values, y=Weather_Condition.index, palette='hls')
plt.show()

## Correlation
Heatmaps shows the correlation between two discrete dimensions or event types.

In [None]:
plt.figure(figsize=(10,6), dpi=100.0)
sns.heatmap(data=df[['Severity', 'Temperature(F)', 'Wind_Chill(F)', 'Humidity(%)', 'Pressure(in)', 'Visibility(mi)', 'Wind_Direction', 'Wind_Speed(mph)', 'Precipitation(in)']].corr(), 
            vmin=-1.0,
            vmax=1.0,
            cmap='plasma',
            annot=True,
            fmt='.2f',
            linewidths=1.0,
            linecolor='white',
            square=True)
plt.show()