# Introduction
***
Taking about mental Health in India is a Taboo. Even when people know they are mentally suffering they don't tend to open about it to others, even to the ones who are very close and dear to them. Even more bizarre is the fact that committing suicide is an offence according to Indian Penal Code SECTION 309. <br>

But let's see how Disturbing the Situation is in India. Because Death especially through suicide is never a fun thing to analyze.

# Importing Necessary Packages and Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import seaborn as sns
import plotly.express as px
import numpy as np

%matplotlib inline

In [None]:
# loading the data into raw_data Data Frame
raw_data = pd.read_csv("../input/suicides-in-india/Suicides in India 2001-2012.csv")

# Data Cleaning and validation

## Lets Check for missing values

In [None]:
plt.rcParams['figure.figsize'] = [12, 8] 
plt.rcParams['figure.dpi'] = 100
sns.set_style('darkgrid')

raw_data.isna().sum().plot(kind = 'bar')
plt.title("No of NaN values in Each Column",fontsize = 14,color = "#073B4C" )
plt.yticks([])
plt.xticks(fontsize = 15,color = '#073B4C')
plt.annotate(xy =(4,0) ,xytext = (1.5,0),text = 'As we can see no NULL values, Lucky me!',fontsize = 15,color = "#EF476F")
plt.show()
plt.clf()

<b>Good thing there is no missing values. Less work for us!!</b>

## Let's a have brief look at the Data Frame and also see what these columns are about

In [None]:
display(raw_data.head())

# raw_data.pivot_table(index = ['Type_code','Type'],values = 'Total', aggfunc = 'sum')

## Checking for data types

In [None]:
# Checking the Data Type
display(raw_data.dtypes)

#Lets check how many categories are in Gender Column 
print("\nUnique Values in a Gender column: \n",raw_data.Gender.unique())

#Lets check how many age groups are in Age_group Column
print("\nUnique Values in a Age_group column: \n",raw_data.Age_group.unique())

#Let's Check up to how many years does this Data is for
print(f"\nThe Data is from {raw_data.Year.unique()[0]} to {raw_data.Year.unique()[-1]}")

<ol><li><b>State</b> Column is given a object lets convert into a String data type </li>
<li><b>Gender</b> Column is given a object lets convert into a Categorical data type </li>
<li><b>Age_group</b> Column is given a object lets convert into a Categorical data type </li>
</ol>

***
<b>Type_code</b> and <b>Type</b> are clearly columns which variable names used to covert this data Frame into a long format. We will use pivot_table or groupby to extract data using them<br>


## Converting the columns to suitable Data Types

In [None]:
# Changing State Column to a string Data type
raw_data['State'] = raw_data.State.astype('str')

#Chaning Gender column
raw_data['Gender'] = raw_data.Gender.astype('category')

# Changing Age_group column to a ordered categorical data type 
age_cats = ['0-14' ,'15-29' ,'30-44' ,'45-59' ,'60+' ,'0-100+']
cat_dtype = pd.api.types.CategoricalDtype(
    categories= age_cats, ordered=True)
raw_data['Age_group'] = raw_data.Age_group.astype(cat_dtype)

# checking data types again
raw_data.dtypes

## lets check the individual columns

### Lets check State Column

In [None]:

print(*raw_data['State'].unique(),sep = '\n')

 ***
 We have three String Values in State Column which are represent overall picture rather than a individual Sate or UT and i think we can use them to our advantage
  <ol>
        <li><b>Total (All India)</b></li>
        <li><b>Total (States)</b></li>
        <li><b>Total (Uts)</b></li>
    </ol>

# Total Suicides in India Over Years(2001-2012)

In [None]:
# importing geopandas and fuzzywuzzy
import geopandas as gpd
from fuzzywuzzy import process

#laoding file
india = gpd.read_file('../input/india-gis-data/India States/Indian_states.shp')


In [None]:
# condtion for exatrctin states and Ut's data from raw_data
condition = (raw_data.State == 'Total (All India)') | (
    raw_data.State == 'Total (States)') | (raw_data.State == 'Total (Uts)')

# creating a new Data Frame with that condition
states_uts = raw_data[~condition]

#creating a pivot table to get total suicides in states and ut's fom 2001 - 2012
total_suicides_states_uts = states_uts.pivot_table(index='State',
                                                   values='Total',
                                                   aggfunc='sum')
total_suicides_states_uts.reset_index(inplace=True)


# Defining a Function for changing State and Ut's name to match the india.st_nm for ease of merging
def state_name_matching(df, geo_df):
    for name in df.State:
        result = process.extract(name, geo_df.st_nm, limit=1)
        df['State'] = df.State.str.replace(name, result[0][0], regex=False)

state_name_matching(total_suicides_states_uts,india)

# Function for merging india Geo pandas with total_sucides calcualted
def merging_for_map(df, geo_df):
    merged_df = geo_df.merge(df,
                             left_on='st_nm',
                             right_on='State',
                             how='outer')
    merged_df.drop(columns='State', inplace=True)
    merged_df = merged_df.rename(columns={'Total': 'Total_Suicides'})
    merged_df.fillna(0, inplace=True)
    merged_df.loc[merged_df['st_nm'] == 'Telangana',
                        'Total_Suicides'] = merged_df.loc[
                            merged_df['st_nm'] == 'Andhra Pradesh',
                            'Total_Suicides'].reset_index(drop=True)[0] / 2
    return merged_df

# Merging india geo panda with total_suicides_states_uts Data Frame
map_of_suicides = merging_for_map(total_suicides_states_uts,india)

## Visualizing The Total Suicides in India(2001-2012)

In [None]:
# creating a seperate data frame to store Total sucides in india over the years
Total_Sucides_India = raw_data[raw_data.State == 'Total (All India)'].groupby(
    raw_data.Year).Total.sum().reset_index()
# display(Total_Sucides_India)

# converting Year column data type to date time
Total_Sucides_India['Year'] = Total_Sucides_India.Year.astype('str')
Total_Sucides_India['Year'] = pd.to_datetime(Total_Sucides_India.Year,
                                             format="%Y")
Total_Sucides_India['Year'] = Total_Sucides_India.Year.dt.strftime("%Y")

#States and their tally

#Plotting and customizing
sns.set_style('darkgrid')

#Creating plots
fig, ax = plt.subplots()
sns.barplot(data=Total_Sucides_India,
            x='Year',
            y='Total',
            ax=ax,
            color='#EF476F')
sns.lineplot(data=Total_Sucides_India,
             x='Year',
             y='Total',
             ax=ax,
             color='#118AB2')

#Labeling the plot
plt.title("Overall Suicides in India from 2001-2012",
          fontsize=30,
          color='#073B4C')
plt.xticks(color='#073B4C', fontsize=20)
plt.yticks(color='#073B4C', fontsize=20)
plt.xlabel('Year(2001-2012)', fontsize=25, color='#073B4C')
plt.ylabel('Number of Suicides', fontsize=25, color='#073B4C')
fig.set_size_inches([28, 12])
plt.show()
plt.clf()

<ol>
        <li>More than Quarter million people take their life in India.</li>
        <li>unfortunately we are observing an increasing in the trend rather than  a decline.</li>
    </ol>

## Indian Map representing overall Suicides in India

In [None]:
plt.rcParams['figure.figsize'] = [12, 8]
sns.set_style('white')
map_of_suicides.plot(column='Total_Suicides', cmap='BuPu', legend=True)
plt.title("Total Suicides in India over(2001-2012)",x = 0.3,color = '#073B4C')
plt.axis('off')
plt.show()

  <ol>
        <li><b>Maharashtra</b> is the Suicide Capital of India</li>
        <li><b>West Bengal</b> is right behind the state of Maharashtra </li>
        <li>Maharashtra is also a state which is infamous for it's farmers suicide</li>
    <li>And both Maharashtra and West Bengal are not the most populous states of India</li>
    </ol>

## Lets have a look at overall Suicides in India with respect to States and UT's in India

In [None]:
display(total_suicides_states_uts.sort_values('Total',ascending = False).reset_index(drop =True))

### The Total suicides in Every State and UT from 2001-2012

In [None]:
total_suicides_sate_wise = total_suicides_states_uts.sort_values('Total',ascending = False).reset_index(drop =True).copy()
total_sate_suicides = sns.barplot(x = 'Total',y ='State',data = total_suicides_sate_wise,color ='#EF476F' )
total_sate_suicides.figure.set_size_inches([18,10])
plt.title("Total State/UT Wise suicides from 2001-2012('double click to zoom')",fontsize = 15,color = '#073B4C')
plt.xticks(color = '#073B4C')
plt.yticks(color = '#073B4C')
plt.xlabel('Total Sucides',fontsize = 15,color = '#073B4C')
plt.ylabel('States and UT\'s',fontsize = 15,color = '#073B4C')
plt.show()
plt.clf()

### Total suicides in all States and UT's from 2001-2012

In [None]:
# creating a Seprate data Frame containg only the values of Tota(states) and Total(uts)
Total_Sucides = raw_data[(raw_data.State == 'Total (States)' ) | (raw_data.State == 'Total (Uts)')]

# Renaming State column into India
Total_Sucides = Total_Sucides.rename(columns = {'State': 'India'})

# creating a Pivot table to do caluclations
data_for_bar = Total_Sucides.pivot_table(
    index='Year',
    columns=['India'],
    values='Total',
    aggfunc='sum')
# Dipalying the Pivot_table
display(data_for_bar)

In [None]:
# creating a stacked bar with state total and UT's Total
sns.set()
data_for_bar.plot(
    kind='bar',
    stacked=True,
    color=['#EF476F', '#118AB2'],
    )
plt.title("Sucides in States and Ut's")
plt.show()
plt.clf()

# Gender wise Suicide analysis

## The portion Males and females in Total Suicides in India(2001-2012)

In [None]:
plt.rcParams['figure.figsize'] = [6, 8]
gender_wise_data = raw_data[
    raw_data['State'] == 'Total (All India)'].pivot_table(
        index='Gender', values='Total', aggfunc='sum').reset_index()
plt.pie(gender_wise_data['Total'],
        autopct='%1.1f%%',
        shadow=True,
        labels=gender_wise_data.Gender,
       colors = ['#EF476F', '#118AB2'])
plt.legend(labels=gender_wise_data.Gender)
plt.show()

<b>A Swarm Plot to get sense of Distribution</b>

In [None]:
# Extracting total suicides with a Gender Categorical column
gender_data = states_uts.groupby(['Year','Gender'])['Total'].sum().reset_index(level=['Gender'])
# plotting a swrm plot
gender_swarm = sns.swarmplot(x = 'Gender',y = 'Total',data = gender_data,palette = ['#EF476F', '#118AB2'])
gender_swarm.figure.set_size_inches(8,6)
plt.xlabel('Gender',color = '#073B4C')
plt.ylabel('Total Suicies',color = '#073B4C')
plt.xticks(color = '#073B4C')
plt.yticks(color = '#073B4C')
plt.title("Gender Wise Swarm plot",color = '#073B4C')
plt.show()

<ul><li>Men Commit <b>28%</b> more suicides in India when compared with Women. </li>
  <li>Total Female Suicides never crossed <b>250000</b> and they Range from <b>200000 to 250000</b></li>
    <li>Total Male Suicides never crossed <b>450000</b> an they Range from <b>200000 to 250000</b></li>
    </ul>

## Analyzing Gender wise suicides proportions across 'Age Groups'

 <b>Table Representing Gender and Age Wise Suicides</b>

In [None]:
gender_age = states_uts[states_uts['Age_group'] != '0-100+']

gender_suicide_propositions = gender_age.pivot_table(index=['Year', 'Gender'],
                                                     values='Total',
                                                     columns='Age_group',
                                                     aggfunc='sum',
                                                     margins=True)
for index, series in gender_suicide_propositions.iterrows():
    gender_suicide_propositions.loc[
        index,
        ['0-14', '15-29', '30-44', '45-59', '60+'
         ]] = gender_suicide_propositions.loc[
             index, ['0-14', '15-29', '30-44', '45-59', '60+'
                     ]] / gender_suicide_propositions.loc[index,
                                                          ['All']]['All']

display(gender_suicide_propositions.drop('0-100+', axis='columns'))

 <b>A Sunburst Chart representing the proportions table</b>

In [None]:
fig = px.sunburst(gender_age,
                  path=['Gender', 'Age_group'],
                  values='Total',
                  color_discrete_map={
                      'Male': '#EF476F',
                      'Female': '#118AB2'
                  })
fig.show()

 <b>A Bar Chart representing the proportions table</b>

In [None]:
x = pd.DataFrame(
    gender_age.groupby(['Year', 'Gender', 'Age_group']).Total.sum())
x.reset_index(level=['Gender', 'Age_group'], inplace=True)
bar = sns.barplot(
    x=x.index,
    y='Total',
    data=x,
    hue='Age_group',
    ci=None,
)
bar.figure.set_size_inches([28, 12])
plt.xlabel('Year(2001-2012)')
plt.ylabel('Total Suicides')
plt.title('Suicides Age Wise')
plt.show()

## Observations from above Visualizations
<ul>
        <li>It's Clear from above <b>Suicides Proportion Table</b>, <b>Sunburst Chart</b> and <b>The Bar chart</b> People from age <b>15-45</b> represent the bulk of suicides.</li>
        <li>The number of Suicides tend to<b> fall after 45 years </b>for both the genders.</li>
    <li>Suicides of People in the age group of <b>0-14 years</b>0-14 years and <b>above 60</b> who tend to be <b>non-working/dependent population</b> is very small compared with 15-45 Age group</li>
</ul>
  

## Visualizing Female and Male Suicides on India Map.

In [None]:
# Extarcting Female Suicides Data
Female_suicides_data = states_uts[states_uts.Gender == 'Female'].groupby('State').Total.sum()
Female_suicides_data = pd.DataFrame(Female_suicides_data).reset_index()

# Calling NEcessary Functions to hel us plot the Data
state_name_matching(Female_suicides_data,india)
Female_suicides_data = merging_for_map(Female_suicides_data,india)

In [None]:
# Extarcting Male Suicides Data
Male_suicides_data = states_uts[states_uts.Gender == 'Male'].groupby('State').Total.sum()
Male_suicides_data = pd.DataFrame(Male_suicides_data).reset_index()

# Calling NEcessary Functions to hel us plot the Data
state_name_matching(Male_suicides_data,india)
Male_suicides_data = merging_for_map(Male_suicides_data,india)

In [None]:
# Creating plots
fig , ax = plt.subplots(1,2)

# Plotting and Cutomizing the Female Data
Female_suicides_data.plot(column = 'Total_Suicides',cmap = 'BuPu',legend = True,ax = ax[0])
ax[0].set_frame_on(False)
ax[0].set_xticks([])
ax[0].set_yticks([])
ax[0].set_title("Female Suicides in India(2001-2012)",fontsize = 15,x = 0.3,color = '#073B4C')

#Plotting and Cutomizing the Male Data
Male_suicides_data.plot(column = 'Total_Suicides',cmap = 'BuPu',legend = True,ax = ax[1])
ax[1].set_frame_on(False)
ax[1].set_xticks([])
ax[1].set_yticks([])
ax[1].set_title("Male Suicides in India(2001-2012)",fontsize = 15,x = 0.3,color = '#073B4C')
fig.set_size_inches([28,12])

 <ol>
        <li><b>Maharastra </b>has the highest concentration of Male Suicides</li>
        <li><b>West Bengal</b> has the highest concentration of Female Suicide</li>
 </ol>

# Analyzing the Causes of the Suicides

In [None]:
data_causes = pd.DataFrame(raw_data[raw_data.Type_code == 'Causes'].groupby('Type').Total.sum().sort_values())
display(data_causes.sort_values('Total',ascending = False))

In [None]:
suicides_per_cause = sns.barplot(x = data_causes.index,y= data_causes.Total,color = '#EF476F')
suicides_per_cause.figure.set_size_inches([28,12])
plt.xticks(rotation = 90 ,fontsize = 25)
plt.yticks(fontsize = 25)
plt.axhline(y = data_causes.Total.max(),linestyle = '--')
plt.show()

The three main Causes for Suicides, if we dont consider "causes not known" and "other causes" seems to be due to:
    <ol>
        <li><b>Family Problems</b></li>
        <li><b>Other Prolonged Illness</b></li>
        <li><b>Mental Illness</b></li>
    </ol>

# Analyzing the Other aspects of Data

## Means Adopted to commit Suicides

In [None]:
Means_adopted = pd.DataFrame(raw_data[raw_data.Type_code == 'Means_adopted'].groupby('Type').Total.sum().sort_values())
display(Means_adopted)
sns.barplot(y=Means_adopted.index, x=Means_adopted.Total,color = '#EF476F')
plt.title('Means_adopted')
plt.show()

## Education Status

In [None]:
Education_Status = pd.DataFrame(
    raw_data[raw_data.Type_code == 'Education_Status'].groupby(
        'Type').Total.sum().sort_values())
display(Education_Status)
sns.barplot(y=Education_Status.index, x=Education_Status.Total,color = '#EF476F')
plt.title('Education_Status')
plt.show()

## Professional Profile

In [None]:
Professional_Profile = pd.DataFrame(
    raw_data[raw_data.Type_code == 'Professional_Profile'].groupby(
        'Type').Total.sum().sort_values())
display(Professional_Profile)
sns.barplot(y=Means_adopted.index, x=Means_adopted.Total,color = '#EF476F')
plt.title('Means_adopted')
plt.show()

## Social Status

In [None]:
Social_Status = pd.DataFrame(
    raw_data[raw_data.Type_code == 'Social_Status'].groupby(
        'Type').Total.sum().sort_values())
display(Social_Status)
sns.barplot(y=Professional_Profile.index,
            x=Professional_Profile.Total,
            color = '#EF476F')
plt.title('Professional_Profile')
plt.show()

## Final Observations

From the above Data we can infer that <br>
    <ol>
        <li>A large portion Women who commit suicides are<b> House wives</b></li>
        <li>The people who are <b>married</b> form large part of suicides number. This matches with number 1 reason for suicides, "Family Problems"</li>
        <li>People who haven never completed more than Secondary or never had any form of education are the who committed most number of suicides<b></b></li>
        <li>Hanging and Poison seems the way in which most of these suicides are committed</li></ol>