# Analysis of Road Accidents in India 2021
### Problem Statement
Despite efforts to improve road safety, India continues to experience a high number of road accidents, leading to significant loss of life, injuries, and economic costs. Effective solutions are needed to address these root causes and foster a culture of road safety nationwide.
### About Project
This project aims to analyze the relationship between various traffic violations and the occurrence of road accidents on Indian National Highways in the year 2021. By examining the dataset encompassing different states/UTs and their corresponding accident counts and fatalities related to specific traffic violations such as over-speeding, drunken driving, driving on the wrong side, jumping red lights, use of mobile phones, and other miscellaneous violations, the project seeks to identify patterns and correlations.
### Data Sources
- The data is sourced from official website of Government of India https://data.gov.in/
### About dataset
This dataset focuses on road accidents that occured in India during the year 2021. It contains information on various traffic violations and their correlation with accident occurrences and fatalities.

Key components of dataset:

- States/UTs: This column likely identifies the different Indian states and union territories included in the data.
- Accident Categories: Multiple columns prefixed with "Over-Speeding," "Drunken Driving," etc. represent specific traffic violations. Each violation is further divided into sub-columns for:
> Total Accidents: The total number of accidents associated with that violation on National Highways under different departments (NHAI, State PWD, Other).
> Death: The number of fatalities resulting from accidents related to that violation under each department.
- Total Accident and Death Counts: Separate columns exist for total accidents and deaths on National Highways under each department (NHAI, State PWD, Other) and overall totals
<img src="https://img.freepik.com/free-vector/car-crash-concept-illustration_114360-8000.jpg?t=st=1713200833~exp=1713204433~hmac=b09958b941595b0109fed9d75552325b91ba34354293e5e5a33bc5ed415bf858&w=996">

# Import Statements

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

# Read the Data

In [2]:
df = pd.read_csv('Accidents-Data-2020.csv')
df.head()

Unnamed: 0,States/UTs,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
0,Andhra Pradesh,4186.0,1551.0,1605.0,697.0,131.0,60.0,19.0,13.0,6.0,...,341.0,186.0,56.0,18.0,5000.0,1884.0,1975.0,895.0,192.0,79.0
1,Arunachal Pradesh,20.0,8.0,10.0,4.0,0.0,0.0,12.0,5.0,2.0,...,4.0,1.0,0.0,0.0,47.0,21.0,20.0,11.0,0.0,0.0
2,Assam,1254.0,505.0,583.0,318.0,393.0,174.0,85.0,33.0,49.0,...,74.0,45.0,65.0,14.0,1720.0,700.0,750.0,422.0,493.0,200.0
3,Bihar,1611.0,1312.0,518.0,412.0,0.0,0.0,1.0,1.0,0.0,...,340.0,238.0,0.0,0.0,3161.0,2567.0,940.0,718.0,0.0,0.0
4,Chhattisgarh,1748.0,690.0,1388.0,539.0,0.0,0.0,8.0,5.0,4.0,...,138.0,56.0,0.0,0.0,1890.0,771.0,1573.0,619.0,0.0,0.0


# Data Preprocessing

### Check for Duplicate Values

In [3]:
df.duplicated().values.any()

False

Data is free from duplicate values.

### Check for Null Values

In [4]:
df.isnull().values.any()

True

Data contains null values.

In [5]:
missing_rows = df[df.isnull().any(axis=1)]
missing_rows

Unnamed: 0,States/UTs,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
8,Himachal Pradesh,,,,,,,,,,...,,,,,,,,,,
32,Daman and Diu,,,,,,,,,,...,,,,,,,,,,


In the dataset, observations for the region of **Daman and Diu** and **Himachal Pradesh** contain missing values. To address this gap in the data, we are employing a mean imputation strategy to fill in the missing values for this region

In [7]:
df = df.fillna(df.mean(numeric_only=True))
df.isna().values.any()

False

Null values are now handled in the dataset.

# Metadata of the Data

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 43 columns):
 #   Column                                                                                                          Non-Null Count  Dtype  
---  ------                                                                                                          --------------  -----  
 0   States/UTs                                                                                                      37 non-null     object 
 1   Over-Speeding - National Highways under NHAI - Total Accidents                                                  37 non-null     float64
 2   Over-Speeding - National Highways under NHAI - Death                                                            37 non-null     float64
 3   Over-Speeding - National Highways under State PWD - Total Accidents                                             37 non-null     float64
 4   Over-Speeding - National Highways under State 

# Preliminary Data Exploration

In [9]:
df.shape

(37, 43)

In [10]:
df.columns

Index(['States/UTs',
       'Over-Speeding - National Highways under NHAI - Total Accidents',
       'Over-Speeding - National Highways under NHAI - Death',
       'Over-Speeding - National Highways under State PWD - Total Accidents',
       'Over-Speeding - National Highways under State PWD - Death',
       'Over-Speeding - National Highways under Other department - Total Accidents',
       'Over-Speeding - National Highways under Other department - Death',
       ' Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents',
       ' Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death',
       ' Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents',
       ' Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Death',
       ' Drunken Driving/ Consumption of alcohol and drug - National Highways under Other department - Total Ac

In [11]:
df.describe()

Unnamed: 0,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Death,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
count,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0,...,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0,37.0
mean,3466.342857,1415.6,1193.542857,372.914286,232.457143,89.942857,131.657143,69.542857,43.657143,26.8,...,304.685714,116.685714,47.714286,25.085714,4602.571429,2007.371429,1659.542857,571.257143,335.085714,145.371429
std,9958.774805,4043.061129,3469.8976,1063.867639,682.745127,262.229855,405.919323,216.28612,130.747837,83.407234,...,898.757129,338.271404,147.543698,79.05391,13186.605244,5746.906187,4803.112361,1626.886159,976.513138,426.287532
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,54.0,13.0,14.0,7.0,0.0,0.0,2.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,112.0,29.0,21.0,11.0,0.0,0.0
50%,866.0,352.0,312.0,118.0,15.0,4.0,10.0,5.0,3.0,1.0,...,54.0,24.0,0.0,0.0,1385.0,700.0,550.0,257.0,34.0,15.0
75%,3466.342857,1448.0,827.0,372.914286,188.0,81.0,83.0,36.0,32.0,10.0,...,304.685714,116.685714,35.0,14.0,4602.571429,2007.371429,1494.0,591.0,289.0,145.371429
max,60661.0,24773.0,20887.0,6526.0,4068.0,1574.0,2304.0,1217.0,764.0,469.0,...,5332.0,2042.0,835.0,439.0,80545.0,35129.0,29042.0,9997.0,5864.0,2544.0
