# Suicide Analysis in India

In this notebook we will try to understand what might be the different reasons due to which people committed suicide in India (using the dataset "Suicides in India"). Almost 11,89,068 people committed suicide in 2012 alone, it is quite important to understand why they commit suicide and try to mitigate.


In [None]:
# import lib
import numpy as np #for math operations
import pandas as pd #for manipulating dataset
import matplotlib.pyplot as plt #for visualization
import seaborn as sns
%matplotlib inline
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_palette("BrBG")

# read dataset
df = pd.read_csv('../input/suicides-in-india/Suicides in India 2001-2012.csv')
df.tail(10)

# Understand dataset

In [None]:
df.info()

# Check for missing values

Lucky for us we don't have any missing values

In [None]:
df.isna().sum()

# How many people committed suicide from 2001-12?

In [None]:
print("Total cases from 2001-12: \n",df.groupby("Year")["Total"].sum())
df.groupby("Year")["Total"].sum().plot(kind="line")

# What all states are present in the dataset?

This step is done in order to merge states with same name but different spelling or redundent state names.

In [None]:
df["State"].value_counts()

Remove rows with value as Total (States), Total (All India) or Total (Uts)

In [None]:
df = df[(df["State"]!="Total (States)")&(df["State"]!="Total (Uts)")&(df["State"]!="Total (All India)") ]

# Which gender tends to commit more suicide?

It looks like Males tend to commit more suicides compared to Females in India.

In [None]:
filter_gender = pd.DataFrame(df.groupby("Gender")["Total"].sum()).reset_index()
sns.catplot(x="Gender", y="Total", kind="bar", data=filter_gender);

# In which states do people tend to commit more suicide?

From the given visualization it is clear that the top 3 states with maximum suicide cases are<br>
1. Maharashtra<br>
2. West Bengal<br>
3. Tamil Nadu<br>

In [None]:
filter_state = pd.DataFrame(df.groupby(["State"])["Total"].sum()).reset_index()
sns.barplot(y = 'State', x = 'Total',data = filter_state, edgecolor = 'w')
plt.show()

## Lets create a WordCloud... why not :P

Words with larger font size are the States which have higher number of suicide cases.

In [None]:
from wordcloud import WordCloud
count = {}
for x in filter_state["State"].values:
    count[x]=int(filter_state[filter_state["State"]==x].Total)

wordcloud = WordCloud(width=1280,height=720,relative_scaling=1,background_color='white',normalize_plurals=False).generate_from_frequencies(count)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

# How has the number of cases changed over time ?

From the previous bar chat, we know that male commit more suicide compared to female, but we didn't know what is the rate of growth of no. of cases.<br>

This plot shows a steeper +ve slope for males compared to females -> **which means more number of males might commit suicide in the future**.

In [None]:
grouped_year = df.groupby(["Year","Gender"])["Total"].sum()
grouped_year = pd.DataFrame(grouped_year).reset_index()
# grouped_year
sns.lmplot(x="Year", y="Total", hue="Gender", data=grouped_year,height=8.27, aspect=11.7/8.27);

# Number of cases bases on the reason they committed suicide("Type_code")

Note: "Causes" means other causes according to me(it was not clearly mentioned in the dataset)

In [None]:
filter_type_code = pd.DataFrame(df.groupby(["Type_code","Year"])["Total"].sum()).reset_index()
filter_type_code
sns.catplot(x="Type_code", y="Total",hue="Year", kind="bar", data=filter_type_code,height=8.27, aspect=11.7/8.27);

# Which social issues causes more suicides?

It appears that **married people** count for the majority of suicide cases.

Which makes sense because marriage issues may cause conflict between the couple and as a result they might be prone to commit suicide.

In [None]:
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Social_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);

# What was the education status of people who committed suicides?

It appears that people with low education tend to commit more suicide.<br>

People with Diploma and Graduate tend to commit least no. of suicide

In [None]:
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Education_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);
g.set_xticklabels(rotation=90)

# What was the profession of the people who committed suicides?

**Farmers** and **housewives** tend to commit more suicide compared to others.

This makes sense because most of the Indian farmers have debt and their life depends on the yield of their crops, if the yield is not good then they will not be able to clear their debt and in the worst case they might commit suicide.

> Global warming, monsoon delay, drought etc can lead to bad yield.

Housewives might have issues in their marriage which this might be a reason for such a high number of cases.
> Domestic violence, dowry, gender discrimination, etc might be some of the reasons for housewives to commit suicide.

In [None]:
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Professional_Profile"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);
g.set_xticklabels(rotation=90)

# Which age group people tend to commit more suicide?

From the below visualization it is clear that youngsters (15-29 age) and middle age (30-44) tend to commit the maximum number of suicides.

It can be due to several reasons like:
* unemployment
* academic stress
* bad friend circle
* farmers (since they have to be young and strong enough to do farming)
* addictions

In [None]:
# age group 0-100+ encapsulates all the remaining age groups, hence it would make sense to drop it
filter_age = df[df["Age_group"]!="0-100+"]
sns.catplot(x="Age_group", y="Total", kind="bar", data=filter_age,height=8.27, aspect=11.7/8.27);

# Conclusion

* Males tend to commit more suicides compared to Females in India
* Highest no. of suicide cases occur in Maharashtra, West Bengal, and Tamil Nadu.
* Male might commit more suicide compared to females in the future if this trend continues.
* People who commit suicide are mostly:
    * Married
    * Farmers and housewives
    * Youngsters (15-29 age) and middle age (30-44)