**<H1> Suicide Trend Analysis**

The given dataset includes the following parameters:
* Country
* Year
* Sex
* Age
* Number of suicides
* Population
* Country-year
* HDI for year
* GDP for year
* GDP per capita
* Generation

**<h2> Importing the Libraries**

In [None]:
#Import Of Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
import warnings
warnings.simplefilter("ignore")

**<h2> Uploading Dataset**

The below command is used to upload the required dataset from one's computer. It then transforms it into a dataframe.We do this in order to interact with the data effectively. The dataframe will be made accessible by variable data.

In [None]:
data=pd.read_csv('../input/sucidedata/SucideData.csv')

In [None]:
data.head()

In [None]:
data.country.unique()

The next line helps display the variable type 

In [None]:
data.dtypes

Next line gives us a concise summary of the dataframe.

In [None]:
data.info()

**<h2> Exploratory Data Analysis**

In [None]:
#Data Cleaning
data["gdp_for_year"]=data[" gdp_for_year"]

In [None]:
data.drop(" gdp_for_year",axis=1,inplace=True)

In [None]:
data.info()



gdp_for_year is numerical feature, but due to comma seperated number it is stored as string


In [None]:
data["gdp_for_year"]=data.gdp_for_year.str.replace(",","")

In [None]:
data.head()

In [None]:
data["gdp_for_year"]=data["gdp_for_year"].astype(float)

**<h3>Count Plot for Generations**

In [None]:
plt.figure(figsize=(30,5))
sns.countplot(data['generation'])

**<h2> Label Encoding**

In [None]:
from sklearn.preprocessing import LabelEncoder
label_encoder1=LabelEncoder()
data["sex"]=label_encoder1.fit_transform(data["sex"])

In [None]:
label_encoder2=LabelEncoder()
data["generation"]=label_encoder2.fit_transform(data["generation"])

In [None]:
data=data.fillna(0)

In [None]:
data.drop(columns=["country","country-year","age"],axis=1,inplace=True)

In [None]:
data.head()

**<h3> Comparing Number of Suicides Between Both the Sexes**

In [None]:
sns.barplot(x='sex',y='suicides_no',data=data)

**<h3>GDP trend over the given years**

In [None]:
plt.figure(figsize=(15,5))
sns.lineplot(x='year',y='gdp_for_year',data=data)

**<h3> Suicide rate over the given period between both the genders**

In [None]:
plt.figure(figsize=(15,5))
sns.lineplot(x='year',y='suicides_no',hue='sex',data=data)

In [None]:
data.dtypes

**<h3>Using Correlation heatmap to find important features and their relations with other features.**

In [None]:
#Correlation Matrix
data.corr()

In [None]:
plt.figure(figsize=(10,10))
sns.heatmap(data.corr(),annot=True)

In [None]:
#Train and Test Data
train=data.drop("suicides_no",axis=1)
test=data["suicides_no"]

**<h3> Density Plot for population**

In [None]:
sns.set_style('whitegrid') 	

In [None]:
sns.distplot(train['population'],bins=100)

In [None]:
sns.distplot(np.log(train['population']),bins=100)

**<h3> Studying Trends between Year and Suicide Numbers**

In [None]:
fig, ax = plt.subplots()
# the size of A4 paper
fig.set_size_inches(13.7, 10.27)
sns.barplot(x='year',y='suicides_no',data=data,ax=ax)

**<h3> Studying Trends between Year and Population**

In [None]:
fig, ax = plt.subplots()
# the size of A4 paper
fig.set_size_inches(13.7, 10.27)
sns.barplot(x='year',y='population',data=train) 	

In [None]:
#Analysis Of all feautres 
sns.pairplot(data,hue="sex")

**<h2> Machine Learning Model**

**<h3> Splitting Training and Test Data**

We import model selection from sklearn to split the data into the training and test sets.

In [None]:
#Library for Training Model
from sklearn import model_selection

In [None]:
#Train Test Split
x_train,x_test,y_train,y_test=model_selection.train_test_split(train,test)

**<h3> Importing Algorithms**

The following algorithms will be tested on the given dataset and the one with the best performance would be declared suitable:
* Random Forest
* Decision Tree
* Linear Regression
* Support Vector Regression


In [None]:
#Training will be done by RandomForest Algorithm
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR

**<h4> Testing on Random Forest Regressor**

Given Estimators are 50 and the criterion being used is Mean Squared Error.

In [None]:
alg1=RandomForestRegressor(n_estimators=50,random_state=0,criterion="mse")
alg1.fit(x_train,y_train)

**<h4> Testing on Decision Tree Regressor**



In [None]:
alg2=DecisionTreeRegressor()
alg2.fit(x_train,y_train)

**<h4> Testing on Linear Regressor**

In [None]:
alg3=LinearRegression()
alg3.fit(x_train,y_train)

**<h4> Testing on Support Vector Regressor**

In [None]:
alg4=SVR()
alg4.fit(x_train,y_train)

**<h4> Predicting the y values from our given models**

In [None]:
#Prediction
y_pred_1=alg1.predict(x_test)
y_pred_2=alg2.predict(x_test)
y_pred_3=alg3.predict(x_test)
y_pred_4=alg4.predict(x_test)

**<h4> Calculating and Printing the accuracy**

In [None]:
from sklearn import metrics

In [None]:
print("Random Forest RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred_1)))
print("Decesion Tree RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred_2)))
print("Linear Regression RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred_3)))
print("SVR RMSE:",np.sqrt(metrics.mean_squared_error(y_test,y_pred_4)))