# Title
This project aims to analyze the relationship between various traffic violations and the occurrence of road accidents on Indian National Highways in the year 2021. By examining the dataset encompassing different states/UTs and their corresponding accident counts and fatalities related to specific traffic violations such as over-speeding, drunken driving, driving on the wrong side, jumping red lights, use of mobile phones, and other miscellaneous violations, the project seeks to identify patterns and correlations.  
### Data Sources
- link
<img src="https://img.freepik.com/free-vector/car-crash-concept-illustration_114360-8000.jpg?t=st=1713200833~exp=1713204433~hmac=b09958b941595b0109fed9d75552325b91ba34354293e5e5a33bc5ed415bf858&w=996">

# Import Statements

In [53]:
import pandas as pd
import matplotlib.pyplot as plt

# Read the Data

In [54]:
df = pd.read_csv('Accidents-Data-2021.csv')
df.head()

Unnamed: 0,States/UTs,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
0,Andhra Pradesh,5167.0,2155.0,1760.0,800.0,113.0,13.0,23.0,8.0,8.0,...,356.0,158.0,4.0,2.0,5937,2603,2148,974,156,25
1,Arunachal Pradesh,32.0,17.0,21.0,10.0,0.0,0.0,18.0,12.0,7.0,...,10.0,5.0,0.0,0.0,89,55,58,32,0,0
2,Assam,1827.0,878.0,697.0,302.0,444.0,185.0,76.0,28.0,53.0,...,3.0,1.0,22.0,9.0,2123,1020,753,330,532,224
3,Bihar,1200.0,904.0,440.0,383.0,0.0,0.0,6.0,2.0,4.0,...,425.0,347.0,0.0,0.0,3403,2726,946,791,0,0
4,Chhattisgarh,1600.0,721.0,1737.0,799.0,0.0,0.0,16.0,8.0,4.0,...,106.0,62.0,0.0,0.0,1734,783,1876,880,0,0


# Data Preprocessing

In [55]:
df.isnull().sum()

States/UTs                                                                                                       0
Over-Speeding - National Highways under NHAI - Total Accidents                                                   1
Over-Speeding - National Highways under NHAI - Death                                                             1
Over-Speeding - National Highways under State PWD - Total Accidents                                              1
Over-Speeding - National Highways under State PWD - Death                                                        1
Over-Speeding - National Highways under Other department - Total Accidents                                       1
Over-Speeding - National Highways under Other department - Death                                                 1
Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents                1
Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI 

In [56]:
missing_rows = df[df.isnull().any(axis=1)]
missing_rows

Unnamed: 0,States/UTs,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
31,Daman and Diu,,,,,,,,,,...,,,,,0,0,0,0,0,0


In the dataset, observations for the region of **Daman and Diu** contain missing values. To address this gap in the data, we are employing a **mean imputation strategy** to fill in the missing values for this region

In [57]:
df = df.fillna(df.mean(numeric_only=True))
df.isna().any().sum()

0

# Metadata of the Data

In [58]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38 entries, 0 to 37
Data columns (total 43 columns):
 #   Column                                                                                                         Non-Null Count  Dtype  
---  ------                                                                                                         --------------  -----  
 0   States/UTs                                                                                                     38 non-null     object 
 1   Over-Speeding - National Highways under NHAI - Total Accidents                                                 38 non-null     float64
 2   Over-Speeding - National Highways under NHAI - Death                                                           38 non-null     float64
 3   Over-Speeding - National Highways under State PWD - Total Accidents                                            38 non-null     float64
 4   Over-Speeding - National Highways under State PWD - 

# Preliminary Data Exploration

In [59]:
df.shape

(38, 43)

In [60]:
df.columns

Index(['States/UTs',
       'Over-Speeding - National Highways under NHAI - Total Accidents',
       'Over-Speeding - National Highways under NHAI - Death',
       'Over-Speeding - National Highways under State PWD - Total Accidents',
       'Over-Speeding - National Highways under State PWD - Death',
       'Over-Speeding - National Highways under Other department - Total Accidents',
       'Over-Speeding - National Highways under Other department - Death',
       'Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents',
       'Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death',
       'Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents',
       'Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Death',
       'Drunken Driving/ Consumption of alcohol and drug - National Highways under Other department - Total Acciden

In [61]:
df.describe()

Unnamed: 0,Over-Speeding - National Highways under NHAI - Total Accidents,Over-Speeding - National Highways under NHAI - Death,Over-Speeding - National Highways under State PWD - Total Accidents,Over-Speeding - National Highways under State PWD - Death,Over-Speeding - National Highways under Other department - Total Accidents,Over-Speeding - National Highways under Other department - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under NHAI - Death,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Total Accidents,Drunken Driving/ Consumption of alcohol and drug - National Highways under State PWD - Death,...,Others - National Highways under State PWD - Total Accidents,Others - National Highways under State PWD - Death,Others - National Highways under Other department - Total Accidents,Others - National Highways under Other department - Death,Total - National Highways under NHAI - Total Accidents,Total - National Highways under NHAI - Death,Total - National Highways under State PWD - Total Accidents,Total - National Highways under State PWD - Death,Total - National Highways under Other department - Total Accidents,Total - National Highways under Other department - Death
count,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0,...,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0,38.0
mean,3722.594595,1660.054054,1239.243243,438.324324,215.72973,88.108108,108.27027,48.918919,32.0,16.054054,...,391.351351,134.540541,57.783784,22.972973,4740.578947,2200.526316,1717.842105,615.210526,321.842105,132.0
std,11170.673958,4952.25926,3734.316149,1303.659632,659.342533,268.680035,340.783886,158.799955,98.870377,50.721255,...,1232.231336,411.138669,184.90657,75.421513,14607.146214,6765.828755,5337.884521,1881.001288,1018.243839,420.62354
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,23.75,17.25,20.0,10.0,0.0,0.0,3.25,0.0,0.25,0.0,...,0.75,0.25,0.0,0.0,41.5,21.5,42.5,8.25,0.0,0.0
50%,1035.0,448.0,331.5,176.0,12.0,3.5,14.5,3.5,4.5,2.0,...,17.0,11.5,0.0,0.0,818.5,284.5,345.0,158.0,18.0,4.5
75%,3443.195946,1673.513514,899.75,424.493243,176.0,75.0,62.5,17.0,15.5,5.75,...,295.5,97.25,25.75,8.75,3316.25,1862.5,1334.25,623.25,171.75,76.5
max,68868.0,30711.0,22926.0,8109.0,3991.0,1630.0,2003.0,905.0,592.0,297.0,...,7240.0,2489.0,1069.0,425.0,90071.0,41810.0,32639.0,11689.0,6115.0,2508.0


# Exploratory Data Analysis - Data Visualisation

## Total Accidents Across States/UTs