# Motor Vehicle Accidents - India

Road accidents notoriously have been accounting the maximum share of accidents that happen using any _mode of transportation_. According to Center for Disease Control (USA), transportation accidents account for 31.9 percent of accidental deaths [reported](http://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_04.pdf) in 2010. 

Road accidents are not just traumatic experience for the driver, passenger or third parties involved, it's also traumatic event for the near and dear ones of those who are directly impacted by the outcome of the accident. Having experienced one road accidents myself, I have always thought what various reasons could lead to roads becoming more unsafe each day.

The data used as part of the notebook is downloaded from [data.gov.in](data.gov.in) and analysed using python, matplotlib and seaborn library.

## Downloading the Dataset

* [Road Accidents in India classified according to various parameters](https://data.gov.in/catalog/road-accidents-india-classified-according-various-parameters?filters%5Bfield_catalog_reference%5D=91420&format=json&offset=12&limit=6&sort%5Bcreated%5D=desc)
* [All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2016](https://data.gov.in/resources/all-india-level-mode-transport-wise-number-persons-died-road-accidents-during-2016)
* [All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2017](https://data.gov.in/resources/all-india-level-mode-transport-wise-number-persons-died-road-accidents-during-2017)
* [All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2018](https://data.gov.in/resources/all-india-level-mode-transport-wise-number-persons-died-road-accidents-during-2018)
* Existing public data set on [Kaggle](https://www.kaggle.com/arindambaruah/indian-road-accidents-data)

Let's begin by downloading the data, and listing the files within the dataset.

In [None]:
#Import required libraries
import numpy as np
import pandas as pd

## Data Preparation and Cleaning

**Data Preparation for analyzing road accidents caused by different type of motor vehicles**

Data in the input file contains data on road accidents caused by both motorised and non-motorised vehicles. For now I want to see the trend of accidents caused by motorised vehicles only.




In [None]:
#analyzing data for 2016
accidents_by_transport_type_data_2016 = "../input/india-mode-of-transport-deaths-road-accidents2016/All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2016.csv"
accidents_by_type_2016_data = pd.read_csv(accidents_by_transport_type_data_2016)
accidents_by_type_2016_data.head()

In [None]:
def transport_type_header(column_data):
    """We do-not need line level details about the accidents, so slicing the dataframe to have only header rows """
    select_row = []
    for transport_type in column_data:
        if transport_type.find(".")< 0:
            select_row.append(True)
        else:
            select_row.append(False)
    return select_row

In [None]:
#Slicing dataframe to have details about motorised vehicles and not further details 
acc_by_type_16 = accidents_by_type_2016_data[[a and b for a, b in zip(transport_type_header(accidents_by_type_2016_data['Mode of Transport']),
    list(accidents_by_type_2016_data['Sl. No.']<2.0))]]

#Slicing the dataframe to have number of road deaths caused due to different mode of transport
acc_by_type_16 = acc_by_type_16[['Mode of Transport', 'No. of Offending Driver/Pedestrian - Died',
                                 'No. of Victims - Died','Total Persons Died']]
acc_by_type_16

The similar data prepration needs to be done for data files of 2017 and 2018 

In [None]:
#analyzing data for 2017
accidents_by_transport_type_data_2017 = "../input/india-mode-of-transport-deaths-road-accidents2016/All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2017.csv"
accidents_by_type_2017_data = pd.read_csv(accidents_by_transport_type_data_2017)

#Slicing dataframe to have details about motorised vehicles and not further details 
acc_by_type_17 = accidents_by_type_2017_data[[a and b for a, b in zip(transport_type_header(accidents_by_type_2017_data['Mode of Transport']),
    list(accidents_by_type_2017_data['Sl. No.']<2.0))]]

#Slicing the dataframe to have number of road deaths caused due to different mode of transport
acc_by_type_17 = acc_by_type_17[['Mode of Transport', 'No. of Offending Driver/Pedestrian - Died',
                                 'No. of Victims - Died','Total Persons Died']]
acc_by_type_17

In [None]:
#analyzing data for 2018
accidents_by_transport_type_data_2018 = "../input/india-mode-of-transport-deaths-road-accidents2016/All India Level Mode of Transport-wise Number of Persons Died in Road Accidents during 2018.csv"
accidents_by_type_2018_data = pd.read_csv(accidents_by_transport_type_data_2018)

#Slicing dataframe to have details about motorised vehicles and not further details 
acc_by_type_18 = accidents_by_type_2018_data[[a and b for a, b in zip(transport_type_header(accidents_by_type_2018_data['Mode of Transport']),
    list(accidents_by_type_2018_data['Sl. No.']<2.0))]]

#Slicing the dataframe to have number of road deaths caused due to different mode of transport
acc_by_type_18 = acc_by_type_18[['Mode of Transport', 'No. of Offending Driver/Pedestrian - Died',
                                 'No. of Victims - Died','Total Persons Died']]
acc_by_type_18

In [None]:
#Combining Dataset from all 3 years (2016-2018)
deaths_by_mode_last3yrs = acc_by_type_17.merge(acc_by_type_18,on="Mode of Transport",how = "inner",suffixes = ("_2017", "_2018"))
deaths_by_mode_last3yrs = acc_by_type_16.merge(deaths_by_mode_last3yrs,on="Mode of Transport",how = "inner")
deaths_by_mode_last3yrs

## Exploratory Analysis and Visualization

Let's review the data and see if total number of deaths in road accidents have been increasing year over year. And review


Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 11
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

**TODO** - Explore one or more columns by plotting a graph below, and add some explanation about it

In [None]:
#Lets take a quick look on the trend of total deaths in road accidents between 2016-2018
#deaths_by_mode_last3yrs.tail(1).loc[:,['Total Persons Died','Total Persons Died_2017','Total Persons Died_2018']]
plt.plot([2016,2017,2018],deaths_by_mode_last3yrs.tail(1).loc[:,['Total Persons Died','Total Persons Died_2017',
                                                                 'Total Persons Died_2018']].transpose(),
        marker = 'x')

plt.xscale("linear")
plt.xlabel("Years")
plt.ylabel("Total Accidents")
plt.title("Number of Deaths due to Road Accidents (2016-18)")
plt.show()

It's great to see that the accidents have become less fatal with number of deaths decreasing considerably.

Now, let's review the reasons for the accidents in those three years

In [None]:
deaths_by_mode_last3yrs[['Total Persons Died%','Total Persons Died%_2017','Total Persons Died%_2018']] = deaths_by_mode_last3yrs.loc[:7,['Total Persons Died','Total Persons Died_2017','Total Persons Died_2018']]/deaths_by_mode_last3yrs.loc[8,['Total Persons Died','Total Persons Died_2017','Total Persons Died_2018']]*100
deaths_by_mode_last3yrs

In [None]:
data = deaths_by_mode_last3yrs.loc[:7,['Mode of Transport','Total Persons Died%','Total Persons Died%_2017','Total Persons Died%_2018']].transpose()
matplotlib.rcParams['figure.figsize'] = (15, 10)
plt.plot([2016,2017,2018],data.iloc[1:,:],marker = 'X')
plt.legend(labels = data.iloc[0,:])
plt.xlabel("Years")
plt.ylabel("Percentage of total accidents")
plt.show()

Its interesting to see that the overall fatal accidents have been reducing, but fatal accidents involving two wheelers have been steadily increasing.

Let's find the accident distribution for 2018

In [None]:
plt.pie(data.iloc[3,:],labels = data.iloc[0,:],autopct='%1.1f%%')
plt.show()

## Asking and Answering Questions

Let's review data from the dataset



#### Q1: Which mode of transport has resulted in maximum loss of lives

In [None]:
deaths_by_mode_last3yrs.loc[:7,:].sort_values(['Total Persons Died','Total Persons Died_2017','Total Persons Died_2018'],ascending=False).head(1)['Mode of Transport']

#### Q2: Which mode of transport has maximum increased deaths year over year

In [None]:
deaths_by_mode_last3yrs['2016-2017']=(deaths_by_mode_last3yrs['Total Persons Died_2017']-deaths_by_mode_last3yrs['Total Persons Died'])/deaths_by_mode_last3yrs['Total Persons Died']*100
deaths_by_mode_last3yrs['2017-2018']=(deaths_by_mode_last3yrs['Total Persons Died_2018']-deaths_by_mode_last3yrs['Total Persons Died_2017'])/deaths_by_mode_last3yrs['Total Persons Died_2017']*100
deaths_by_mode_last3yrs

In [None]:
deaths_by_mode_last3yrs.sort_values(['2016-2017','2017-2018'],ascending=False).head(1)[['Mode of Transport','2016-2017','2017-2018']]

#### Q3: Which mode of transport has decreased the maximum year over year

In [None]:
deaths_by_mode_last3yrs.sort_values(['2016-2017','2017-2018'],ascending=True).head(2)[['Mode of Transport','2016-2017','2017-2018']]

#### Q4: Which mode of travel has maximum driver deaths

In [None]:
deaths_by_mode_last3yrs.loc[1:7,:].sort_values(['No. of Offending Driver/Pedestrian - Died','No. of Offending Driver/Pedestrian - Died_2017','No. of Offending Driver/Pedestrian - Died_2018'],ascending=False).head(1)[['Mode of Transport','No. of Offending Driver/Pedestrian - Died','No. of Offending Driver/Pedestrian - Died_2017','No. of Offending Driver/Pedestrian - Died_2018']]

#### Q5: Which mode of travel results in maximum loss of lives for victims

In [None]:
deaths_by_mode_last3yrs.loc[1:7,:].sort_values(['No. of Victims - Died','No. of Victims - Died_2017','No. of Victims - Died_2018'],ascending=False).head(1)[['Mode of Transport','No. of Victims - Died','No. of Victims - Died_2017','No. of Victims - Died_2018']]

## Inferences and Conclusion

It's great to see the number of the fatal accidents have been steadily decreasing, but it's also alarming to see the rate at which number of deaths involving two wheelers have increased. It might be because of the less safety features available to the two wheelers that leads to more fatal accidents as compared to any other mode of transport.