# Introduction

This notebook is about World Happiness Report which is a landmark survey of the state of global happiness.

In this tutorial, I am going to work on Machine Learning.

<font color='red'>
Content:
    
1. [Load and Check Data](#1)
2. [Variable Description](#2)
3. [Correlation](#3)
4. [Linear Regression](#4)
    * [Simple Linear Regression](#5)
    * [Multiple Linear Regression](#6)
    * [Polynomial Linear Regression](#7)
5. [Decision Tree Regression](#8)
6. [Random Forest Regression](#9)
7. [R-Square](#10)
    * [R-Square with Linear Regression](#11)
    * [R-Square with Random Forest Regression](#12)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<a id="1"></a> <br>
## Load and Check Data

* First of all, we are going to read the reports in our dataset.

In [None]:
data_2015=pd.read_csv("../input/world-happiness/2015.csv")
data_2016=pd.read_csv("../input/world-happiness/2016.csv")
data_2017=pd.read_csv("../input/world-happiness/2017.csv")
data_2018=pd.read_csv("../input/world-happiness/2018.csv")
data_2019=pd.read_csv("../input/world-happiness/2019.csv")

In [None]:
#Summary Analysis

data_2019.head()

In [None]:
#Checking info because of the data types and missing values.

#data_2015.info()
#data_2016.info()
#data_2017.info()
#data_2018.info()
data_2019.info()

As you can see, we have different kind of columns in our reports. So I will work 2019 year's reports.

In 2019 year's report;
* Length:156 (Range Index)
* Features are float other than rank and country.
* We have no NAN values in this reports.

In [None]:
#Describing datas.

data_2019.describe()

<a id="2"></a> <br>
## Variable Description

* Overall rank -> Siralama
* Country or region -> Ulke ya da bolge
* Score -> Skor (Orneklenen insanlara şu soruyu sorarak ölçülen bir metrik: "Mutluluğunuzu 10'un en mutlu olduğu 0 ile 10 arasında bir ölçekte nasıl derecelendirirsiniz?")
* GDP per capita -> Kisi basina dusen milli gelir
* Social support -> Sosyal destek
* Healthy life expectancy -> Saglikli yasam beklentisi
* Freedom to make life choices -> Yaşam seçimleri yapma özgürlüğü
* Generosity -> Comertlik
* Perceptions of corruption -> Yolsuzluk algıları

<a id="3"></a> <br>
## Correlation

In [None]:
#Correlation Map

list1=["Score","GDP per capita","Social support","Healthy life expectancy","Freedom to make life choices","Generosity","Perceptions of corruption"]
fig, ax = plt.subplots(figsize=(12,10)) 
sns.heatmap(data_2019[list1].corr(), annot=True, cmap="YlGnBu", linewidths=.5, fmt= '.1f',ax=ax)
plt.show()

From the above charts, we can obtain the following conclusions:
* The Happiness Score is highly related with the GDP per Capita, Social Support and Healthy Life Expectancy.
* The Happines Score is not related at all with the Generosity Variable.

So, as a first conclusion, we could say that the Happiest Countries will be the ones with higher GDP per capita, Social Support and Life Expectancy.

<a id="4"></a> <br>
## Linear Regression

<a id="5"></a> <br>
### Simple Linear Regression
* Score - GDP per capita

In [None]:
linear_reg=LinearRegression()

x=data_2019.Score.values.reshape(-1,1)
y=data_2019["GDP per capita"].values.reshape(-1,1)

linear_reg.fit(x,y)

plt.scatter(data_2019["Score"],data_2019["GDP per capita"])
y_head = linear_reg.predict(x) #maas
plt.plot(x,y_head,color="red")
plt.show()

<a id="6"></a> <br>
### Multiple Linear Regression
* Score - Social support - GDP per capita

In [None]:
x = data_2019.iloc[:,[2,4]].values
y = data_2019["GDP per capita"].values.reshape(-1,1)

# %%
multiple_linear_regression_dataset=LinearRegression()
multiple_linear_regression_dataset.fit(x,y)

plt.scatter(data_2019["Score"],data_2019["Social support"],data_2019["GDP per capita"])
y_head = multiple_linear_regression_dataset.predict(x) #maas
plt.plot(x,y_head,color="green")
plt.show()

<a id="7"></a> <br>
### Polynomial Linear Regression
* Score - Healthy life expectancy

In [None]:
x=data_2019.Score.values.reshape(-1,1)
y=data_2019["Healthy life expectancy"].values.reshape(-1,1)
plt.scatter(data_2019["Score"],data_2019["Healthy life expectancy"])
plt.show()

In [None]:
x=data_2019.Score.values.reshape(-1,1)
y=data_2019["Healthy life expectancy"].values.reshape(-1,1)
plt.scatter(data_2019["Score"],data_2019["Healthy life expectancy"],color="yellow")

polynomial_regression=PolynomialFeatures(degree=4)
x_polynomial=polynomial_regression.fit_transform(x)

#%% fit
linear_regression2=LinearRegression()
linear_regression2.fit(x_polynomial,y)

#%%
y_head2=linear_regression2.predict(x_polynomial)

plt.plot(x,y_head2,color="purple",label="poly")
plt.legend()
plt.show()

<a id="8"></a> <br>
## Decision Tree Regression

In [None]:
x=data_2019["Freedom to make life choices"].values.reshape(-1,1)
y=data_2019["Generosity"].values.reshape(-1,1)

tree_reg=DecisionTreeRegressor()      
tree_reg.fit(x,y)

x_=np.arange(min(x),max(x),0.01).reshape(-1,1)
y_head=tree_reg.predict(x_)

#%% Visualize
plt.scatter(x,y,color="red")
plt.plot(x_,y_head,color="green")
plt.xlabel("Freedom to make life choices")
plt.ylabel("Generosity")
plt.show()

<a id="9"></a> <br>
## Random Forest Regression

In [None]:
x=data_2019["Freedom to make life choices"].values.reshape(-1,1)
y=data_2019["Generosity"].values.reshape(-1,1).ravel()

rf=RandomForestRegressor(n_estimators=100,random_state=42)
rf.fit(x,y)

x_=np.arange(min(x),max(x),0.01).reshape(-1,1)
y_head=rf.predict(x_)

#%% Visualize
plt.scatter(x,y,color="red")
plt.plot(x_,y_head,color="green")
plt.xlabel("Freedom to make life choices")
plt.ylabel("Generosity")
plt.show()

<a id="10"></a> <br>
## R-Square
* SSR = Sum(Square Residual) = Sum[(y-y_head)^2]
* SST = Sum(Square Total) = Sum[(y-y_average)^2]
* **R-Square = 1 - (SSR / SST)**

<a id="11"></a> <br>
### R-Square with Linear Regression

In [None]:
linear_reg=LinearRegression()

x=data_2019.Score.values.reshape(-1,1)
y=data_2019["GDP per capita"].values.reshape(-1,1)

linear_reg.fit(x,y)

#%% Visualize
plt.scatter(data_2019["Score"],data_2019["GDP per capita"])
y_head = linear_reg.predict(x) #maas
plt.plot(x,y_head,color="red")

#%% R-Square
from sklearn.metrics import r2_score
print("r_square score: ",r2_score(y,y_head))

plt.show()

<a id="12"></a> <br>
### R-Square with Random Forest Regression

In [None]:
x=data_2019["Freedom to make life choices"].values.reshape(-1,1)
y=data_2019["Perceptions of corruption"].values.reshape(-1,1).ravel()

rf=RandomForestRegressor(n_estimators=100,random_state=42)
rf.fit(x,y)

y_head=rf.predict(x)

#%% R-Square
from sklearn.metrics import r2_score
print("r_score: ",r2_score(y,y_head))

x_=np.arange(min(x),max(x),0.01).reshape(-1,1)
y_head=rf.predict(x_)

#%% Visualize
plt.scatter(x,y,color="red")
plt.plot(x_,y_head,color="green")
plt.xlabel("Freedom to make life choices")
plt.ylabel("Perceptions of corruption")
plt.show()