<b>ID</b>: This is a unique identifier of the accident record.

<b>Source</b>: Indicates source of the accident report (i.e. the API which reported the accident.).

<b>TMC</b>: A traffic accident may have a Traffic Message Channel (TMC) code which provides more detailed description of the event.

<b>Severity</b>: Shows the severity of the accident, a number between 1 and 4, where 1 indicates the least impact on traffic (i.e., short delay as a result of the accident) and 4 indicates a significant impact on traffic (i.e., long delay).

<b>Start_Time</b>: Shows start time of the accident in local time zone.

<b>End_Time</b>: Shows end time of the accident in local time zone.

<b>Start_Lat</b>: Shows latitude in GPS coordinate of the start point.

<b>Start_Lng</b>: Shows longitude in GPS coordinate of the start point.

<b>End_Lat</b>: Shows latitude in GPS coordinate of the end point.

<b>End_Lng</b>: Shows longitude in GPS coordinate of the end point.

<b>Distance(mi)</b>: The length of the road extent affected by the accident.

<b>Description</b>: Shows natural language description of the accident.

Address Attributes (9):

</b>Number</b>: Shows the street number in address field.

</b>Street</b>: Shows the street name in address field.

<b>Side</b>: Shows the relative side of the street (Right/Left) in address field.

<b>City</b>: Shows the city in address field.

<b>County</b>: Shows the county in address field.

<b>State</b>: Shows the state in address field.

<b>Zipcode</b>: Shows the zipcode in address field.

Country: Shows the country in address field.

Timezone: Shows timezone based on the location of the accident (eastern, central, etc.).

Weather Attributes (11):

Airport_Code: Denotes an airport-based weather station which is the closest one to location of the accident.

Weather_Timestamp: Shows the time-stamp of weather observation record (in local time).

Temperature(F): Shows the temperature (in Fahrenheit).

Wind_Chill(F): Shows the wind chill (in Fahrenheit).

Humidity(%): Shows the humidity (in percentage).

Pressure(in): Shows the air pressure (in inches).

Visibility(mi): Shows visibility (in miles).

Wind_Direction: Shows wind direction.

Wind_Speed(mph): Shows wind speed (in miles per hour).

Precipitation(in): Shows precipitation amount in inches, if there is any.

Weather_Condition: Shows the weather condition (rain, snow, thunderstorm, fog, etc.).

POI Attributes (13):

Amenity: A Point-Of-Interest (POI) annotation which indicates presence of amenity in a nearby location.

Bump: A POI annotation which indicates presence of speed bump or hump in a nearby location.

Crossing: A POI annotation which indicates presence of crossing in a nearby location.

Give_Way: A POI annotation which indicates presence of give_way sign in a nearby location.

Junction: A POI annotation which indicates presence of junction in a nearby location.

No_Exit: A POI annotation which indicates presence of no_exit sign in a nearby location.

Railway: A POI annotation which indicates presence of railway in a nearby location.

Roundabout: A POI annotation which indicates presence of roundabout in a nearby location.

Station: A POI annotation which indicates presence of station (bus, train, etc.) in a nearby location.

Stop: A POI annotation which indicates presence of stop sign in a nearby location.

Traffic_Calming: A POI annotation which indicates presence of traffic_calming means in a nearby location.

Traffic_Signal: A POI annotation which indicates presence of traffic_signal in a nearby location.

Turning_Loop: A POI annotation which indicates presence of turning_loop in a nearby location.

Period-of-Day (4):

<b>Sunrise_Sunset</b>: Shows the period of day (i.e. day or night) based on sunrise/sunset.

<b>Civil_Twilight</b>: Shows the period of day (i.e. day or night) based on civil twilight.

<b>Nautical_Twilight</b>: Shows the period of day (i.e. day or night) based on nautical twilight.

<b>Astronomical_Twilight</b>: Shows the period of day (i.e. day or night) based on astronomical twilight.

In [1]:
import numpy as np 
import pandas as pd 
import json
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
from datetime import datetime
import glob
from scipy.stats import boxcox


In [None]:
df = pd.read_csv('US_Accidents_Dec20.csv')
print("The shape of data is:",(df.shape))
display(df.head(3))


In [None]:
df.dtypes


In [None]:
df.isnull().sum()


this will give us the count of accidents per state

In [None]:
state_wise_counts = df.groupby('State')['ID'].count().reset_index()


In [None]:
state_wise_counts.shape


In [None]:
state_wise_counts = state_wise_counts.sort_values(by = "ID",ascending=False)


In [None]:
state_wise_counts.head()


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")


In [None]:
f, ax = plt.subplots(figsize=(9, 15))
sns.barplot(y="State", x="ID", data=state_wise_counts, palette="gist_stern")


 analyze which are the most frequent road features in accidents

In [None]:
road_features = ["Amenity", "Bump", "Crossing", "Give_Way", "Junction", "No_Exit", "Railway", "Roundabout", "Station", "Stop", "Traffic_Calming", "Traffic_Signal", "Turning_Loop"]
data = df[road_features].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 8))
plt.title("Most frequent road features")
sns.barplot(data.values, data.index, color='cyan')
plt.xlabel("count")
plt.ylabel("Condtions")
plt.show()

In [None]:
counts = df["Weather_Condition"].value_counts()[:15]
plt.figure(figsize=(20, 8))

plt.title("Histogram distribution of the top 15 weather conditions")
sns.barplot(counts.index, counts.values, palette="Paired")
plt.xlabel("Weather Condition")
plt.ylabel("Count")
plt.show()


In [None]:
for s in np.arange(1,5):
    plt.subplots(figsize=(12,5))
    df.loc[df["Severity"] == s]['Weather_Condition'].value_counts().sort_values(ascending=False).head(20).plot.bar(width=0.5,color='r',edgecolor='k',align='center',linewidth=1)
    plt.xlabel('Weather Condition',fontsize=10)
    plt.ylabel('Accident Count',fontsize=10)
    plt.title('20 Most frequent Weather Conditions for Accidents of Severity ' + str(s),fontsize=15)
    plt.xticks(fontsize=10)
    plt.yticks(fontsize=10)

In [None]:
counts = pd.to_datetime(df['Start_Time']).dt.day_name().value_counts()
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

plt.figure(figsize=(20, 8))
plt.title("Number of accidents for each day in a week")
sns.barplot(counts.index, counts.values, order=weekdays)
plt.xlabel("Weekday")
plt.ylabel("Value")
plt.show()


In [None]:
df_time = df.loc[:,['ID', 'Start_Time', 'End_Time']]  # converting start end time to date time format
df_time['Start_Time'] = pd.to_datetime(df_time['Start_Time'])
df_time['End_Time'] = pd.to_datetime(df_time['End_Time'])
df_time.info()


In [None]:
df_time['Start_hour'] = df_time['Start_Time'].dt.hour
hours = df_time.groupby(['Start_hour']).count()

hours

In [None]:
f, ax = plt.subplots(figsize = (18, 10))
sns.barplot(x = hours.index, y = 'Start_Time', data = hours, color='green')
plt.xlabel('Time of accident (hour)', labelpad = 10, fontsize=12, weight='bold')
plt.ylabel('Number of accidents', labelpad = 10, fontsize=12, weight='bold')
plt.title('Time when accident usually oocur in the US', fontsize = 15, weight = 'bold')


As We Can See Most Accident Occurs in the Morning Between 6am to 10 Am. As people leave to their Work in hurry.

Next Highest Percentage is between 3pm to 6pm

In [None]:

import plotly.graph_objects as go
import matplotlib.ticker as ticker


%matplotlib inline


In [None]:
df_st_ct = pd.value_counts(df['State'])

fig = go.Figure(data=go.Choropleth(
    locations=df_st_ct.index,
    z = df_st_ct.values.astype(float),  # Data to be color-coded
    locationmode = 'USA-states',     # set of locations match entries in `locations`
    colorscale = 'darkmint',
    colorbar_title = "Count",
))

fig.update_layout(
    title_text = 'US Accidents by State',
    geo_scope='usa', # limite map scope to USA
)

fig.show()
