**IMPORTING LIBRARIES**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

**READING THE DATASET**

In [None]:
data=pd.read_csv('/kaggle/input/suicide-rates-overview-1985-to-2016/master.csv')

In [None]:
data.head()

In [None]:
data.columns

**Let us rename the columns for our easy understanding.**

In [None]:
data.rename(columns={'suicides/100k pop':'Suicides Per 100k Pop', ' gdp_for_year ($) ':'GDP For Year',
                          'gdp_per_capita ($)':'GDP Per Capita'}, inplace=True)

In [None]:
data.shape

**Thus we have around 27820 rows and 12 columns. Let us check for null values.**

In [None]:
data.isnull().sum()

**We can drop the HDI for year column since it has a lot of null values in it.**

In [None]:
data.drop(['HDI for year'],axis=1,inplace=True)

In [None]:
data.head()

In [None]:
data.hist(grid=True,figsize=(15,20),color='lime')
plt.show()

**Let us look at the data by Country wise:**

In [None]:
alpha = 0.7
plt.figure(figsize=(10,25))
sns.countplot(y='country', data=data, alpha=alpha)
plt.title('Data by country')
plt.show()

**We can infer that Iceland,Netherlands,Mauritius and Austria has the highest content here.**

In [None]:
plt.rcParams['figure.figsize']=(6,6)

In [None]:
data['sex'].value_counts().plot.bar(color='red')

**Let us compare the suicides numbers of male and female based on the age category.**

In [None]:
sns.barplot(x='sex', y='suicides_no', hue='age', data=data)

**We can infer that in males the age group between 35-54 has the highest suicide rate.**

**In females also we observe the same pattern.**

**Now,Let us compare them based on the generation**

In [None]:
sns.barplot(x='sex', y='suicides_no', hue='generation', data=data)

In [None]:
plt.figure(figsize=(30,10))
y = data['year']
sns.set_context("paper", 2.0, {"lines.linewidth": 4})
sns.countplot(y,label='count')

**The number of suicides was highest in the year 2009 followed by 2001 and 2010.**

In [None]:
plt.figure(figsize=(14,4))
plt.subplot(121)
plt.title('Suicide Number')
sns.distplot(data['suicides_no'], hist=False)
plt.subplot(122)
plt.title('Suicide Number Per 100k population')
sns.distplot(data['Suicides Per 100k Pop'], hist=False)
plt.tight_layout()

In [None]:
data['GDP For Year'] = data['GDP For Year'].apply(lambda x: x.replace(',','')).astype(float)

**The plot is right skewed.**

**The variance is very high in the suicide numbers.**

In [None]:
plt.subplots(figsize=(12,6))
sns.heatmap(data.corr(), annot=True, linewidths = 0.3)
plt.title('Correlation of the dataset', size=16)
plt.show()

**Biggest correlation is between suicides_no and population and also between population and GDP for year.**

In [None]:
data.groupby(["sex"])["suicides_no"].sum().reset_index()

In [None]:
sns.barplot(x="sex", y="suicides_no", data=data, palette="Blues_d")
plt.show()

**We can clearly see that the suicide number was way high in the case of male compared to females.**

In [None]:
country= data.groupby(['country'])['suicides_no'].sum().reset_index()
country = country.sort_values('suicides_no',ascending=False)
country=country.head()
plt.subplots(figsize=(15,6))
sns.barplot(x='country', y='suicides_no', data=country,color='fuchsia')

**This plot gives insights about the number of suicides across various countries.**