__Suicide in the 21st Century__


In 2017, a man whom I considered one of the greatest musicians of all time and author, or co-author, of a number of songs that had become the soundtrack of my life, died in suicide. Many were shocked, and many miss him today.
In 2018, a man whose TV programs repeatedly kept me in a state of respect towards, and fascination about, the diversity of the world, died in suicide. Many were shocked, and many miss him today.

Close to 800000 less-famous people die due to suicide every year, according to WHO. I looked at the WHO Suicide Statistics to see where the world as a whole is heading.

Is it getting better or worse? You will see my answer below.


In [None]:
import pandas as pd
import numpy as np
import random
from sklearn import metrics
import seaborn as sns;sns.set()
import matplotlib.pyplot as plt
%matplotlib inline
import holoviews as hv
from holoviews import opts, dim, Palette
import geoviews as gv
hv.extension('bokeh', 'matplotlib')
opts.defaults(
    opts.Bars(xrotation=45, tools=['hover']),
    opts.Curve(width=600,height=400, tools=['hover']),
    
    opts.Scatter(width=800, height=600, color=Palette('Category20'), tools=['hover']),
    opts.NdOverlay(legend_position='top_left'))


In [None]:
df = pd.read_csv('../input/master.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.info()

__HDI for year__ has missing values.

In [None]:
df.describe()

In [None]:
df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
df_country["suicides_per_100k"]=df_country["suicides_no"]/(df_country["population"]/100000)
df_country["Average_suicide"] = (df_country["suicides_no"]/31).astype(int)
df_country.head()


In [None]:
def avg(): 
    g=df_country.sort_values("Average_suicide").tail(10)
    return(g)
   
avg()

__Insights :-__

Average Suicide Rate of __Russian Federation__ and __United States__, is leading the overall data by more than three times.

In [None]:
def map():
    import geopandas as gpd
    world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    world=world.rename(index=str, columns={"name": "country"})
    
    df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
    df_country["suicides_per_100k"]=df_country["suicides_no"]/(df_country["population"]/100000)
    df_country["Average_suicide"] = (df_country["suicides_no"]/31).astype(int)
    df_country["country"]=df_country["country"].str.replace("Russian Federation","Russia")
    
    df1 = pd.merge(world, df_country,  on='country', how='outer')
    
    df1 = df1.dropna(axis=0)
    polys = gv.Polygons(df1, vdims=['suicides_per_100k',"Average_suicide", 'suicides_no', 'country'])
    polys.opts(width=800, height=400, tools=['hover'], cmap='viridis', ylim=(-60, 90))
    return(polys)
map()

__Some countries are missing in our dataset, hence the world map show gappings.__

In [None]:
def top_countries():
    df_country = df.groupby(["country","year"],as_index=False)["population","suicides_no"].sum()
    stage = df_country.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]

    macro = hv.Dataset(stage, ['country', 'year'])
    plot = macro.to(hv.Curve, 'year', ['suicides_no',"population"]).overlay()

    plot.relabel('Suicide Rate between 1985 - 2016')
    return(plot)
top_countries()

In [None]:
def all_countries():
    df_country=df.groupby(["country","year"],as_index=False)["population","suicides_no"].sum()
    macro = hv.Dataset(df_country, ['country', 'year'])
    curves = macro.to(hv.Curve, 'year', 'suicides_no', groupby='country')
    
    return(curves)
all_countries()


__Insights :-__

__Countries with increasing trend in suicide are:__
* Brazil
* Mexico
* Republic of Korea
* United States

__Countries with constant but considerable suicides per year are:__
* France
* Germany
* Kazakhstan
* Sri Lanka
* Thailand
* Ukraine

__Countries with first increasing then decreasing in suicide trend (no. of suicides in these countries are quite high as comparable to other countries):__
* Japan
* Russian Federation


In [None]:
def suicides_per_100k():
    df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
    df_country["suicides_per_100k"]=df_country["suicides_no"]/(df_country["population"]/100000)
    plt.figure(figsize=(10,20))
    plt.axvline(df_country["suicides_per_100k"].mean(), color='r', linestyle='--')
    df_country.sort_values('suicides_per_100k',inplace=True)
    sns.barplot(y=df_country["country"], x=df_country["suicides_per_100k"])
    plt.xlabel('No. of suicides per 100k')
    plt.ylabel('Countries')
    plt.title('Total No. of Suicides per country from 1985 to 2016')
    return(plt.show())
suicides_per_100k()

In [None]:
def suicides_per_100ks():
    df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
    df_country["suicides_per_100k"]=df_country["suicides_no"]/(df_country["population"]/100000)
    macro = hv.Dataset(df_country, ["country"])
    bars = macro.to(hv.Bars, "country", 'suicides_per_100k')
    bars.opts(width=900,height=400)

    return(bars)
#suicides_per_100ks()

__Insights :-__

__Lithuaniaâ€™s__ rate has been highest by a large margin: > 40 suicides per 100k (per year)


__Russian Federation__ rate has been second highest > 30 suicides per 100k (per year)

In [None]:
def max_suicide():
    df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
    plt.figure(figsize=(10,20))
    plt.axvline(df_country["suicides_no"].mean(), color='r', linestyle='--')
    df_country.sort_values('suicides_no',inplace=True)
    sns.barplot(y=df_country["country"], x=df_country["suicides_no"])
    plt.xlabel('No. of suicides')
    plt.ylabel('Countries')
    plt.title('Total No. of Suicides per country from 1985 to 2016')
    return(plt.show())
max_suicide()

In [None]:
def max_suicides():
    df_country=df.groupby(["country"],as_index=False)["population","suicides_no"].sum()
    macro = hv.Dataset(df_country, ["country"])
    bars = macro.to(hv.Bars, "country", 'suicides_no')
    bars.opts(width=900,height=400)

    return(bars)
   
#max_suicides()

__Insights :-__

Countries with highest __Number of Suicides__ are :
* France
* Japan
* Russian Federation
* United States

 Based on above list let's check the record of last 10 years


In [None]:
def past_ten_years():
    df_country=df.groupby(["country","year","sex"],as_index=False)["population","suicides_no"].sum()
   # df_country = df_country[df_country["year"] > 2005]
    stage = df_country.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]

    macro = hv.Dataset(stage, ['year',"sex"])
    bars = macro.sort('country').to(hv.Bars, 'country', 'suicides_no')
    bars.opts(width=600,height=400)

    return(bars)
past_ten_years()

In [None]:
def gender():
    df_country=df.groupby(["country","year","sex"],as_index=False)["population","suicides_no"].sum()
    df_country = df_country[df_country["year"] > 2005]
    stage = df_country.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]
    stagem = stage.loc[stage['sex']=="male"]
    stagef = stage.loc[stage['sex']=="female"]
    
    macrof = hv.Dataset(stagef, ['country', 'year'])
    macrom = hv.Dataset(stagem, ['country', 'year'])
    curvem = macrom.to(hv.Curve, 'year', ['suicides_no', 'sex'], label="Male")
    curvef = macrof.to(hv.Curve, 'year', ['suicides_no', 'sex'], label="Female")
    
    curves=(curvem * curvef)
    
    return(curves)
gender()

__Insights :-__

__According to the data, Males are more prone to commit suicide.__

Over past __10__ years;


__United States__, __Brazil__, __Mexico__, __Republic of Korea__ shows increasing trend in suicide rate;

__Japan__ and __Russian Federation__ shows decreasing trend; whereas, __France__ shows constant trend in suicide rate.


 Lets check the countries with increasing trend suicide

In [None]:
def generations():
    df_country = df.groupby(["country","year","generation"],as_index=False)["population","suicides_no"].sum()
    df_s = df_country[df_country["year"] > 2005]
    
    stage = df_s.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]

    stage1 = stage.loc[stage['generation'].str.contains("Generation X")]
    stage2 = stage.loc[stage['generation'].str.contains("Silent")]
    stage3 = stage.loc[stage['generation'].str.contains("G.I. Generation")]
    stage4 = stage.loc[stage['generation'].str.contains("Boomers")]
    stage5 = stage.loc[stage['generation'].str.contains("Millenials")]
    stage6 = stage.loc[stage['generation'].str.contains("Generation Z")]
    
    
    macro1 = hv.Dataset(stage1, ['country', 'year'])
    macro2 = hv.Dataset(stage2, ['country', 'year'])
    macro3 = hv.Dataset(stage3, ['country', 'year'])
    macro4 = hv.Dataset(stage4, ['country', 'year'])
    macro5 = hv.Dataset(stage5, ['country', 'year'])
    macro6 = hv.Dataset(stage6, ['country', 'year'])
    
    
    curve1 = macro1.to(hv.Curve, 'year', 'suicides_no', label="Generation X")
    curve2 = macro2.to(hv.Curve, 'year', 'suicides_no', label="Silent")
    curve3 = macro3.to(hv.Curve, 'year', 'suicides_no', label="G.I. Generation")
    curve4 = macro4.to(hv.Curve, 'year', 'suicides_no', label="Boomers")
    curve5 = macro5.to(hv.Curve, 'year', 'suicides_no', label="Millenials")
    curve6 = macro6.to(hv.Curve, 'year', 'suicides_no', label="Generation Z")
    
    
    curves=(curve1 * curve2 * curve3 *curve4 *curve5 *curve6).opts( legend_position='top_right')
    
    
    return(curves)
generations()

__Insights :-__

__Generation X (with high peak around 2010)__ and __Millenials(with rapid increase around 2010)__ shows high potential as compared to other generations;

__Silent (with rapid decrease around 2010)__ and __Boomers__ shows significant decrease in suicide rate;

whereas, __Generation Z__ follows constant trend.

__2010 is a interesting year, need to learn about it more.__

In [None]:
def age():
    df_country=df.groupby(["country","year","age"],as_index=False)["population","suicides_no"].sum()
    df_s=df_country[df_country["year"] > 2005]
    stage = df_country.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]
    
    
    macro = hv.Dataset(stage, ['year',"age"])
    bars = macro.sort('country').to(hv.Bars, 'country', 'suicides_no')
    bars.opts(width=600,height=400)
    return(bars)
age()

In [None]:
def ages():
    df_country = df.groupby(["country","year","age"],as_index=False)["population","suicides_no"].sum()
   # df_s = df_country[df_country["year"] > 2005]
    stage = df_country.loc[df_country['country'].isin(["Mexico","Republic of Korea","France","Japan","Russian Federation","United States","Brazil"])]
    stage1 = stage.loc[stage['age'].str.contains("15-24 years")]
    stage2 = stage.loc[stage['age'].str.contains("35-54 years")]
    stage3 = stage.loc[stage['age'].str.contains("75+ years")]
    stage4 = stage.loc[stage['age'].str.contains("25-34 years")]
    stage5 = stage.loc[stage['age'].str.contains("55-74 years")]
    stage6 = stage.loc[stage['age'].str.contains("5-14 years")]
    
    
    macro1 = hv.Dataset(stage1, ['country', 'year'])
    macro2 = hv.Dataset(stage2, ['country', 'year'])
    macro3 = hv.Dataset(stage3, ['country', 'year'])
    macro4 = hv.Dataset(stage4, ['country', 'year'])
    macro5 = hv.Dataset(stage5, ['country', 'year'])
    macro6 = hv.Dataset(stage6, ['country', 'year'])
    
    
    curve1 = macro1.to(hv.Curve, 'year', 'suicides_no', label="15-24 years")
    curve2 = macro2.to(hv.Curve, 'year', 'suicides_no', label="35-54 years")
    curve3 = macro3.to(hv.Curve, 'year', 'suicides_no', label="75+ years")
    curve4 = macro4.to(hv.Curve, 'year', 'suicides_no', label="25-34 years")
    curve5 = macro5.to(hv.Curve, 'year', 'suicides_no', label="55-74 years")
    curve6 = macro6.to(hv.Curve, 'year', 'suicides_no', label="5-14 years")
    
    
    curves=(curve1 * curve2 * curve3 *curve4 *curve5 *curve6).opts( legend_position='top_right')
    
    
    return(curves)
ages()

__Insights :-__

According to data, between __35 - 75__ years people shows more tendency to commit suicide; whereas, before __14__ and after __75__ years there are very less or almost no suicide cases as comparision to other age groups. 

Age groups like __35-54 years__ and __55-74 years__ are dominating our data.

Youth between __25-34 years__ are also can not be neglected. 

In [None]:
def overall():
    
    df_country=df.groupby(["year","sex"],as_index=False)["population","suicides_no"].sum()
    
    macro = hv.Dataset(df_country, ['year',"sex"])
    bars = macro.to(hv.Bars, "sex", 'suicides_no')
    bars.opts(width=350,height=400)

    return(bars)

overall()

__Insight :-__
Overall trend in suicides over past years increases with time.

 _Does GDP and Population of the country affects the suicide rate?_

In [None]:
def gdp():
    df_gdp = df.groupby(["country","year","gdp_per_capita ($)"],as_index=False)["population","suicides_no"].sum()
    df_gdp["suicides_per_100k"] = df_gdp["suicides_no"]/(df_country["population"]/100000)
    sns.pairplot(df_gdp, x_vars=['suicides_no'], y_vars=["gdp_per_capita ($)"], height=6, aspect=2, kind='reg')
    X = df_gdp["gdp_per_capita ($)"].values.reshape(-1,1)
    y = df_gdp["suicides_no"]
    from sklearn.linear_model import LinearRegression
    lm = LinearRegression()
    lm.fit(X,y)
    print(lm.intercept_)
    print(lm.coef_)
    print("Y = {}".format(lm.coef_),"X + {}".format(lm.intercept_))
gdp()    

__Insights :-__

A weak relation, as can be seen in the above graph; let's evaluate.

X = GDP per Capita ($),

Y = Suicides number.

__Y = 0.035*X + 2316__; meaning with __2316$__ increase in __gdp per capita__, there are additional __one__ suicide.

Not a very good regresssion, as correaltion is very weak. 



__Please upvote my work if you like this sheet.__

__Thanks for your support.__