# Analysis of Suicides in India

This is a dataset which comprises of suicide data in India from year 2001-2002. The data is well prepared and cleaned with no null values . 
Now lets take a look at the columns in this dataset:
1. State : Name of the state in India.
2. Year : The year in which suicides were committed.
3. Type_code : It has two values which are causes and profession.
4. Type : It consists of various factors and causes of death.
5. Gender : Male or Female.
6. Age_group : It consists of various age groups.
7. Total : It consists of total number of suicides in that particular row.

## The major questions which we will try to answer using the data:

Suicide is a serious problem and to understand and prevent suicides we need insights into some questions . 
Some questions which we will try to answer in this analysis are :

* Que.1> What are the top states in India in which number of suicides are higher.
* Que.2> Trend of suicides in major states.(Through heatmap)
* Que.3> Analysis of causes/factors which contributed the most.(Causes,Means_adopted,Professional_Profile,Education_Status,Social_Status) 
* Que.4> In which age group people committed most number of suicides.
* Que.5> Yearly trend of total suicides.

Further we will dive deep into analysis of suicides among students based on above questions. 

IMPORTING THE LIBRARIES

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

IMPORTING THE DATASET

In [None]:
data = pd.read_csv('../input/suicides-in-india/Suicides in India 2001-2012.csv')

EXPLORING THE DATASET

In [None]:
data.head(10)

CHECK FOR NULL VALUES

In [None]:
data.info()

CREATING THE COPY OF DATASET

In [None]:
data2 = data.copy()

GROUPING THE DATASET BY STATE AND NUMBER OF SUICIDES

In [None]:
grp = data2.groupby('State')['Total'].sum()
total_suicides = pd.DataFrame(grp).reset_index().sort_values('Total',ascending=False)
total_suicides = total_suicides[2:]

IMPORTING LIBRARIES FOR VISUALIZATION , PLOTTING THE NUMBER OF SUICIDES STATEWISE IN DESCENDING ORDER.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
fig , ax = plt.subplots(figsize=(18,6))
g=sns.barplot(x='State',y='Total',data=total_suicides)
g.set_xticklabels(g.get_xticklabels(),rotation=45)

*CLEARLY WE CAN SEE THE TOP STATES FROM THE ABOVE GRAPH WHICH ARE MAHARASHTRA , WEST BENGAL , TAMILNADU , ANDHRA PRADESH , KARNATAKA*

NOW WE WILL BUILD A TABLE WHICH SHOWS NUMBER OF SUICIDES IN EVERY STATE YEARWISE.

In [None]:
x = data.groupby(['State','Year'])['Total'].sum()
y = pd.DataFrame(x).reset_index()
y = y.pivot(index='State',columns='Year')
y['sum'] = y.sum(axis=1)
yearly_total = y.sum(axis=0)
y = y.sort_values('sum',ascending=False)
y = y[2:14]
y = y/10
y = y.drop('sum',axis=1)
y

NOW WE SHALL PLOT THE HEATMAP TO ANALYSE THE TRENDS IN TOP STATES IN EACH YEAR.

In [None]:
plt.figure(figsize=(8,8))
sns.heatmap(y,linewidth=1,cmap='OrRd',square=True)

*ONE ANOMALY WHICH WE CAN SEE CLEARLY THAT IN WEST BENGAL THE NUMBER OF SUICIDES SUDDENLY DROPPED IN YEAR 2012*

NOW WE SHALL EXPLORE AND VISUALISE THE DIFFERENT FACTORS PRESENT IN TYPE_CODES COLUMN

In [None]:
data['Type_code'].value_counts()

CLEARLY WE HAVE 5 CATEGORIES WHICH WE NEED TO EXPLORE.

1. CAUSES

In [None]:
data3 = data[data['Type_code']=='Causes']
reasons = data3.groupby('Type')['Total'].sum()
suicide_reasons = pd.DataFrame(reasons).reset_index().sort_values('Total',ascending=False)
suicide_reasons = suicide_reasons[:15]
plt.figure(figsize=(18,6))
g2 = sns.barplot(y='Type',x='Total',data=suicide_reasons)

*FROM THE ABOVE GRAPH WE CAN CLEARLY SEE THE TOP REASONS FOR SUICIDE. FAMILY PROBLEMS IS THE MOST FREQUENT REASON FOR SUICIDE.*

2. MEANS_ADOPTED

In [None]:
data3 = data[data['Type_code']=='Means_adopted']
reasons = data3.groupby('Type')['Total'].sum()
suicide_reasons = pd.DataFrame(reasons).reset_index().sort_values('Total',ascending=False)
suicide_reasons = suicide_reasons[:15]
plt.figure(figsize=(18,6))
g2 = sns.barplot(y='Type',x='Total',data=suicide_reasons)


*CLEARLY WE CAN SEE THE TOP MEANS OF SUICIDE IN INDIA , IN WHICH MOST PEOPLE COMMIT SUICIDE BY HANGING.*

3. PROFESSIONAL PROFILE

In [None]:
data3 = data[data['Type_code']=='Professional_Profile']
reasons = data3.groupby('Type')['Total'].sum()
suicide_reasons = pd.DataFrame(reasons).reset_index().sort_values('Total',ascending=False)
suicide_reasons = suicide_reasons[:15]
plt.figure(figsize=(18,6))
g2 = sns.barplot(y='Type',x='Total',data=suicide_reasons)

HERE WE CAN SEE THAT IN WHICH OCCUPATTION PEOPLE ARE COMMITTING MORE NUMBER OF SUICIDES . MAY BE THIS CAN HELP A LOT IN IMPROVING WORKING CONDITIONS OVER THIS OCCUPATIONS.  

*ONE INTERESTING THING HERE TO NOTE IS THE NUMBER OF PEOPLE COMMITTING SUICIDES IN PRIVATE JOB SECTOR AND UNEMPLOYED PEOPLE ARE ALMOST EQUAL.*

4. EDUCATION STATUS

In [None]:
data3 = data[data['Type_code']=='Education_Status']
reasons = data3.groupby('Type')['Total'].sum()
suicide_reasons = pd.DataFrame(reasons).reset_index().sort_values('Total',ascending=False)
#suicide_reasons = suicide_reasons[:15]
plt.figure(figsize=(18,6))
g2 = sns.barplot(y='Type',x='Total',data=suicide_reasons)

THE ABOVE GRAPH DEPICTS THE EDUCATIONAL QUAIFICATION OF THE PEOPLE WHO COMMITTED SUICIDE.

5. SOCIAL STATUS

In [None]:
data3 = data[data['Type_code']=='Social_Status']
reasons = data3.groupby('Type')['Total'].sum()
suicide_reasons = pd.DataFrame(reasons).reset_index().sort_values('Total',ascending=False)
#suicide_reasons = suicide_reasons[:15]
plt.figure(figsize=(18,6))
g2 = sns.barplot(y='Type',x='Total',data=suicide_reasons)

THE ABOVE GRAPH SHOWS THE MARITAL STATUS OF THE PEOPLE WHO COMMITTED SUICIDE.

### NOW WE WILL TRY TO FIND WHICH AGE GROUP COMMITS MORE SUICIDE

In [None]:
age_grp = data.groupby('Age_group')['Total'].sum()
age = pd.DataFrame(age_grp).reset_index()
age = age[1:]
age

In [None]:
plt.subplots(figsize=(5,5))
g = sns.barplot(x='Age_group',y='Total',data=age)

CLEARLY PEOPLE AGED BETWEEN 15-45 COMMITTED MORE NUMBER OF SUICIDES.

*THE DATA WHICH IS SHOCKING THAT CHILDREN UNDER AGE OF 14 ALSO COMMITTED SUICIDES.*

PLOT OF YEARLY TREND IN SUICIDE IN INDIA.

In [None]:
yearly = pd.DataFrame(yearly_total).reset_index()[:-1].drop('level_0',axis=1)
yearly.columns = ['Year','No of suicides']
plt.figure(figsize=(10,5))
sns.lineplot(x='Year',y='No of suicides',data=yearly)


*WE CAN CLEARLY SEE THAT THE SIGNIFICANT SLOPE IN GRAPH CAME IN YEAR 2005-2006 AND FURTHER IN 2009-2010. DURING THESE YEARS THE NUMBER OF SUICIDES ROSE FROM PREVIOUS YEARS.*

THE GOOD NEWS IS WE CAN SEE THAT THE SLOPE IS DECRESING IN YEAR 2010-2011 AND FINALLY IT DECREASED IN YEAR 2011-2012.

## NOW WE WILL DIVE DEEP INTO THE DATASET WHICH CONTAINS DATA OF STUDENTS ONLY

EXPLORING THE DATA

In [None]:
student = data[data['Type']=='Student']
student

PLOTTING THE GRAPH AGEWISE IN STUDENTS DATASET.

In [None]:
age_grp = student.groupby('Age_group')['Total'].sum()
age = pd.DataFrame(age_grp).reset_index()
age

In [None]:
plt.subplots(figsize=(5,5))
g = sns.barplot(x='Age_group',y='Total',data=age)

PLOTTING YEARLY TREND IN NUMBER OF SUICIDES COMMITTED BY STUDENTS

In [None]:
std_year = student.groupby('Year')['Total'].sum()
stdyr = pd.DataFrame(std_year).reset_index()
plt.figure(figsize=(12,6))
sns.lineplot(x='Year',y='Total',data=stdyr)

THROUGH THE GRAPH WE CAN SEE THAT IT RISES OVERALL BUT DOES NOT FOLLOW A PROPER PATTERN. MAXIMUM SUICIDES HAPPENED IN INTERVAL 2008-2011.

PLOTTING THE DATA OF STUDENTS STATEWISE.

In [None]:
grp = student.groupby('State')['Total'].sum()
total_suicides = pd.DataFrame(grp).reset_index().sort_values('Total',ascending=False)[:10]
plt.figure(figsize=(15,6))
sns.barplot(y='State',x='Total',data=total_suicides)

FINDING THE NUMBER OF MALES AND FEMALES IN STUDENTS DATASET.

In [None]:
gen = student.groupby('Gender')['Total'].sum()
gender = pd.DataFrame(gen).reset_index()
gender

In [None]:
piex = ['Female','Male']
piey = pd.Series(gender['Total'])
fig1, ax1 = plt.subplots()
ax1.pie(piey,labels=piex)
plt.show()

FINALLY PLOTTING THE PIE CHART FOR GENDER IN NUMBER OF SUCIDES COMMITTED BY STUDENTS . WE CAN INFER THAT THERE IS NOT SIGNIFICANT DIFFERENCE IN NUMBERS. 

**HENCE WE ANSWERED ALL THE QUESTIONS AND DERIVED USEFUL INSIGHTS FROM THIS DATASET WHICH IS A MAJOR CONCERN IN ANY COUNTRY.**