Task:   

Which time do accidents usually occur in the US? 

A ranking of the times that accidents occur.   
The time that accidents usually occur should be at the top of the list together with the number of times accidents occurred during that specific time. 

In [None]:
# Python 3 environment with analytics libraries installed
# as defined by the kaggle/python Docker 

import numpy as np 
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
#open CSV file as a pandas dataframe
df = pd.read_csv('../input/us-accidents/US_Accidents_June20.csv')

The csv file as a dataframe contains 49 columns and 3,513,617 entries.   
I chose 5 columns to work with.

In [None]:
df.columns

In [None]:
accidents=df[['ID', 'Severity', 'Start_Time', 'City', 'State']]

In [None]:
accidents.info()

112 rows do not have a city entry (these are NaNs). While limiting, it may not affect the results.

In [None]:
accidents.isnull().sum()

Dataframe sorted by start time:

In [None]:
sortedByTime=accidents.sort_values(by=['Start_Time'])
sortedByTime

The **severity** of accidents in the US, from **2016-June 2020** averages at **2**.   
**May 15, 2017 at 9:22 am** is the date with the most reported accidents.   
Houston, Tx is the city that reported the most accidents, while California is the state with the most reported accidents.

In [None]:
sortedByTime.describe(include='all')

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline

In [None]:
sortedByTime.groupby(sortedByTime.Start_Time)['Start_Time'].value_counts().plot(figsize=(12, 6), color='mediumorchid')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
plt.title("Accidents per date", fontsize=16)

The graph above shows that accident frequency improved in 2017,and have not reach again over 60 accidents since mid 2016.

In [None]:
# Get frequency count of Start_Time
frequency = sortedByTime['Start_Time'].value_counts()
frequency

Dataframe of date/time values sorted by frequency. Below is the **Top 20**.

In [None]:
df2 = pd.DataFrame(frequency)
plotTop=df2.head(20)
plotTop

In [None]:
#reindex 
plotTop=plotTop.reset_index()
plotTop

Plot of top 10 date/time when accidents occur:

In [None]:
one = plotTop['index'].head(10)
two = plotTop['Start_Time'].head(10) 
  
fig, ax = plt.subplots(figsize =(14, 9))   

ax.barh(one, two, color='purple') 
ax.set_title('Top 10 date/time') 
plt.show() 

In [None]:
#reindex 
df3=df2.reset_index()
df3=df3.rename(columns={"index": "date_time", "Start_Time": "counts"})
df3

In [None]:
df3['Dates'] = pd.to_datetime(df3['date_time']).dt.date
df3['Months'] = pd.to_datetime(df3['date_time']).dt.month
df3['Times'] = pd.to_datetime(df3['date_time']).dt.time
df3['Hours'] = pd.to_datetime(df3['date_time']).dt.hour
df3['Year'] = pd.to_datetime(df3['date_time']).dt.year
df3['Day'] = pd.to_datetime(df3['date_time']).dt.day

In [None]:
df3['Months'] = df3['Months'].astype(str)
df3['Hours'] = df3['Hours'].astype(str)
df3['Year'] = df3['Year'].astype(str)
df3['Day'] = df3['Day'].astype(str)

New dataframe with date/times separated:

In [None]:
df4=df3[['Dates','Times','Months','Day','Year']]
df4

In [None]:
df4.describe(include='all')

When dates and times are separated, the most frequent date was **November 6, 2018**.   
**5:24pm** was the most common time accidents occur in the US.  
**October** was the month with the most accidents.   
The **12th day** of the month is the most common day for accidents.   
**2019** reported the most accidents of all the years, from 2016 to June 2020.
