## BACKGROUND

The World Happiness Report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. The World Happiness Report 2020 for the first time ranks cities around the world by their subjective well-being and digs more deeply into how the social, urban and natural environments combine to affect our happiness.

In July 2011, the UN General Assembly adopted resolution 65/309 'Happiness: Towards a Holistic Definition of Development' inviting member countries to measure the happiness of their people and to use the data to help guide public policy. On April 2, 2012, this was followed by the first UN High Level Meeting called Wellbeing and Happiness: Defining a New Economic Paradigm, which was chaired by UN Secretary General Ban Kimoon and Prime Minister Jigme Thinley of Bhutan, a nation that adopted gross national happiness instead of gross domestic product as their main development indicator.

Drawing international attention, the report outlined the state of world happiness, causes of happiness and misery, and policy implications highlighted by case studies. In 2013, the second World Happiness Report was issued, and since then has been issued on an annual basis with the exception of 2014. The report primarily uses data from the Gallup World Poll. Each annual report is available to the public to download on the World Happiness Report website



## METHODS AND PHILOSOPHY

The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale. The report correlates the results with various life factors. In the reports, experts in fields including economics, psychology, survey analysis, and national statistics, describe how measurements of well-being can be used effectively to assess the progress of nations, and other topics. 

Each report is organized by chapters that delve deeper into issues relating to happiness, including mental illness, the objective benefits of happiness, the importance of ethics, policy implications, and links with the Organisation for Economic Co-operation and Development's (OECD) approach to measuring subjective well-being and other international and national efforts.

https://worldhappiness.report

## FEATURES ANALYZED


* **rank** : Rank in Happiness
* **region** : Country and region
* **score** : Happiness Score
* **gdp** : GDP per capita is a measure of a country's economic output that accounts for its number of people.
* **support** : Social support means having friends and other people, including family, to turn to in times of need or crisis to give you a broader focus and positive self-image. Social support enhances quality of life and provides a buffer against adverse life events.
* **life_exp** : Healthy Life Expectancy is the average number of years that a newborn can expect to live in "full health", in other words, not hampered by disabling illnesses or injuries.
* **freedom** : Freedom of choice describes an individual's opportunity and autonomy to perform an action selected from at least two available options, unconstrained by external parties.
* **generosity** : Quality of being kind and generous.
* **corruption** : The Corruption Perceptions Index (CPI) is an index published annually by Transparency International since 1995 which ranks countries "by their perceived levels of public sector corruption, as determined by expert assessments and opinion surveys.

### WHAT DOES MAKE PEOPLE HAPPY?

* Is this GDP per capita which makes you happy ?
* Is this Perception of Corruption about Government, which make you sad?
* Is this Freedom of Life Choices which makes you happy ?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# LOADING MAIN LIBRARIES

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns

In [None]:
plt.rcParams['figure.figsize'] = (15, 15)
%matplotlib inline

In [None]:
!pip install chart-studio

In [None]:
# VISUALIZATION LIBRARIES

import chart_studio.plotly as py 
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot


In [None]:
# LOADING THE DATA

df_2015 = pd.read_csv('../input/world-happiness/2015.csv')
df_2016 = pd.read_csv('../input/world-happiness/2016.csv')
df_2017 = pd.read_csv('../input/world-happiness/2017.csv')
df_2018 = pd.read_csv('../input/world-happiness/2018.csv')
df_2019 = pd.read_csv('../input/world-happiness/2019.csv')

### LETS TAKE A LOOK AT THE FEATURES OF EACH DATAFRAME

Also, select the columns needed.

In [None]:
df_2019.info()

In [None]:
df_2019.columns = ["rank","region","score", "gdp","support","life_exp","freedom","generosity","corruption"]

In [None]:
df_2018.info()

In [None]:
df_2018.columns = ["rank","region","score", "gdp","support","life_exp","freedom","generosity","corruption"]

In [None]:
df_2017.info()

In [None]:
df_2017.drop(["Whisker.high","Whisker.low","Dystopia.Residual"],axis=1,inplace=True)
df_2017.columns =  ["region","rank","score","gdp","support", "life_exp","freedom","generosity","corruption"]

In [None]:
df_2016.info()

In [None]:
df_2016.drop(['Region','Lower Confidence Interval','Upper Confidence Interval','Dystopia Residual'],axis=1,inplace=True)
df_2016.columns = ["region","rank","score","gdp","support","life_exp","freedom","generosity","corruption"]

In [None]:
df_2015.info()

In [None]:
df_2015.drop(["Region",'Standard Error', 'Dystopia Residual'],axis=1,inplace=True)
df_2015.columns = ["region", "rank", "score", "gdp", "support","life_exp", "freedom", "corruption", "generosity"]

In [None]:
# ADDING COLUMN FOR YEAR

df_2015["year"] = 2015
df_2016["year"] = 2016
df_2017["year"] = 2017
df_2018["year"] = 2018
df_2019["year"] = 2019

In [None]:
quarter = ['Top','Top-Mid', 'Low-Mid', 'Low' ]
quarter_n = [4, 3, 2, 1]


df_2015["quarter"] = pd.qcut(df_2015['rank'], len(quarter), labels=quarter)
df_2015["quarter_n"] = pd.qcut(df_2015['rank'], len(quarter), labels=quarter_n)

df_2016["quarter"] = pd.qcut(df_2016['rank'], len(quarter), labels=quarter)
df_2016["quarter_n"] = pd.qcut(df_2016['rank'], len(quarter), labels=quarter_n)

df_2017["quarter"] = pd.qcut(df_2017['rank'], len(quarter), labels=quarter)
df_2017["quarter_n"] = pd.qcut(df_2017['rank'], len(quarter), labels=quarter_n)

df_2018["quarter"] = pd.qcut(df_2018['rank'], len(quarter), labels=quarter)
df_2018["quarter_n"] = pd.qcut(df_2018['rank'], len(quarter), labels=quarter_n)

df_2019["quarter"] = pd.qcut(df_2019['rank'], len(quarter), labels=quarter)
df_2019["quarter_n"] = pd.qcut(df_2019['rank'], len(quarter), labels=quarter_n)

In [None]:
# APPENDING ALL TOGUETHER
df = df_2015.copy()
df = df.append([df_2016,df_2017,df_2018,df_2019])

In [None]:
#CHECKING FOR MISSING DATA
df.isnull().any()

#filling missing values of corruption with its mean
df.corruption.fillna((df.corruption.mean()), inplace = True)

Thats how our data ended up. 

In [None]:
df

### EDA 

Now that we setup our data, let's get some insights from the visualizations.

In [None]:
df.columns

In [None]:
# CHECKING NUMERICAL DATA
df.describe()

How are the values distributed in each feature?

In [None]:
# DISTRIBUTION OF ALL NUMERIC DATA
plt.rcParams['figure.figsize'] = (15, 15)
df.hist();

We can see that all numeric features are normal distributed. Also, there are few outliers in 'support', 'corruption' and 'generosity' as you can see as follow, but it won't impact the analysis.

In [None]:
plt.rcParams['figure.figsize'] = (10, 10)
df[['score', 'gdp', 'support', 'life_exp', 'freedom',
       'corruption', 'generosity']].boxplot();

Let's check which countries are better positioned in each of the aspects being analyzed.

In [None]:
# CHECK FOR TOP COUNTRIES IN EACH FEATURE

fig, axes = plt.subplots(nrows=3, ncols=2,constrained_layout=True,figsize=(10,10));

sns.barplot(x='gdp',y='region',data=df.nlargest(10,'gdp'),ax=axes[0,0],palette="RdYlGn_r")
sns.barplot(x='support' ,y='region',data=df.nlargest(10,'support'),ax=axes[0,1],palette="RdYlGn_r")
sns.barplot(x='life_exp' ,y='region',data=df.nlargest(10,'life_exp'),ax=axes[1,0],palette='RdYlGn_r')
sns.barplot(x='freedom' ,y='region',data=df.nlargest(10,'freedom'),ax=axes[1,1],palette='RdYlGn_r')
sns.barplot(x='generosity' ,y='region',data=df.nlargest(10,'generosity'),ax=axes[2,0],palette='RdYlGn_r')
sns.barplot(x='corruption' ,y='region',data=df.nlargest(10,'corruption'),ax=axes[2,1],palette='RdYlGn_r')

Since I'm brazilian, I want to check how was Brazil performance over the years:




In [None]:
df.loc[df['region']=='Brazil']

Although Brazil has always been among the top 25% in the rank it's position has been changing.
It's interesting how Brazil has been droping it's rank position. I believe this is the result of a feeling of political/economical uncertainty last years. Brazil has been coming from a populistic government to a strickter one.

### Now, I want to compare the top 10 economies in 2020.

In [None]:
# COMPARING BIGGEST ECONOMIES IN THE WORLD

top_econ = ['Brazil','India','China', 'United States', 'Japan', 'Germany', 'United Kingdon', 'France', 'Italy','Canada']

df_top = df[(df['region'].isin(top_econ))].sort_values(['region', 'year'])
df_top.reset_index(drop=True)

It's quite intereresting to see that the Top 10 Economies are distributed all across the Rank. I see that even though a high GDP per capta is assumed to bring happiness, the reality shows something diffetent. 

In a next session I'll check how is the data correlated.

## MAP VISUALIZATION

Data visualization provides us with a quick, clear understanding of the information. Thanks to graphic representations, we can visualize large volumes of data in an understandable and coherent way, which in turn helps us comprehend the information and draw conclusions and insights.

Beeing said, let's check some map visualizations of the features.

In [None]:
# Happiness Rank Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['rank'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = True)
layout = dict(title = 'Happiness Rank Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

In the previous  map, we can see that American and European countries are better hanked. In contrast, African and Asian countries are 'in theory' less happy.

The seccond map shows that the rank and score are strongly correlated, which makes sense since the higher the score better the hank.

In [None]:
#Happiness Score Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['score'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Happiness Score Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

In terms of GDP, African countries show a critical score for GDP as result of Several long-standing challenges that hold back progress. Around 640 million people currently live without electricity in Africa – 210 million of which are in fragile and conflict-affected countries.  Public debt levels and debt risk are rising, which might jeopardize debt sustainability in some countries; the availability of good jobs has not kept pace with the number of entrants in the labor force; fragility is costing the subcontinent a half of a percentage point of growth per year; and gender gaps persist and are keeping the continent from reaching its full growth and innovation potential. More than 416 million Africans still live in extreme poverty.

In [None]:
#Happiness GDP Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['gdp'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Happiness GDP Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

Studies have demonstrated that social isolation and loneliness are associated with a greater risk of poor mental health and poor cardiovascular health, as well as other health problems. Other studies have shown the benefit of a network of social support, including the following:

* Improving the ability to cope with stressful situations
* Alleviating the effects of emotional distress
* Promoting lifelong good mental health
* Enhancing self-esteem
* Lowering cardiovascular risks, such as lowering blood pressure
* Promoting healthy lifestyle behaviors
* Encouraging adherence to a treatment plan

In [None]:
#Perception of Social Support Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['life_exp'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Perception of Social Support Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

In [None]:
#Perception of Freedom Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['freedom'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Perception of Freedom Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

When people trust government institutions (the public service), they’re more likely to take part in government processes. For example, they’re more likely to:

* vote
* use the services they’re entitled to
* provide information about themselves that allows government to provide effective services
* pay taxes, user charges, and licence fees.


On the next map we can see that people all over the world (with few exceptions) trust in their government. We can rise a flag here to further think how's it impacting the other aspects analyzed.

In [None]:
#Trust in government Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['corruption'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Trust in government Across the World in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

Last, but not least, let's take a look at the quartile  groups (4:'Top',3:'Top-Mid', 2:'Low-Mid', 1:'Low') and check how they're distributed across the world.

In [None]:
#Quartile  groups [4:'Top',3:'Top-Mid', 2:'Low-Mid', 1:'Low' ] Across the World in 2019

map_plot = dict(type = 'choropleth', 
           locations = df_2019['region'],
           locationmode = 'country names',
           z = df_2019['quarter_n'], 
           text = df_2019['region'],
          colorscale = 'rdylgn', reversescale = False)
layout = dict(title = 'Quartile % Group  in 2019', 
             geo = dict(showframe = False, 
                       projection = {'type': 'equirectangular'}))
choromap = go.Figure(data = [map_plot], layout=layout)
iplot(choromap)

Again, American and European Countries are better ranked.

## PREDICTIVE MODEL OF WORLD HAPPINESS (2015 - 2019)

To proceed with finding a predictive model, we first drop 'rank', 'quarter', 'quarter_n' from our data frame as they don't  really tell us anything important in the model itself. Then, we want to get an overall idea of which variables are correlating with each other strongly. Since our focus is happiness score, let’s concentrate on that column. 


In [None]:
drop_rank = df.drop(["rank", 'quarter', 'quarter_n'], axis = 1)

In [None]:
plt.rcParams['figure.figsize'] = (10,10)
sns.pairplot(drop_rank, hue = 'year', corner=True);

Let's take a seccond look into correlations:

**Kendall’s** Tau rank correlation: usually smaller values than Spearman’s rho correlation. Calculations based on concordant and discordant pairs. Insensitive to error. P values are more accurate with smaller sample sizes.

In [None]:
# LET'S TAKE A SECOND LOOK INTO CORRELATIONS
df_clean = df.drop(["rank", 'quarter', 'quarter_n', 'year'], axis=1)
                   
kendall_corr = df_clean.corr(method='kendall')
kendall_corr

**Spearman** rank-order correlation: evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate. The Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.



In [None]:
spearman_corr = df_clean.corr(method='spearman')
spearman_corr


**Pearson** product moment correlation: evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.

In [None]:
pearson_corr = df_clean.corr(method='pearson')
pearson_corr

### Heatmap of correlations

The darker red the square, the stronger the positive correlation, and obviously, variables will have a correlation of 1 with each other. We can see that happiness score is really strongly correlated with GDP and Healthy Life Expectation, followed by Social Support and Freedom. Thus, in our model, we should see that reflected when finding the coefficients. 

While trust and generosity to not have a strong positive correlation — we can see that they do have a negative correlation to happiness score, so it would be beneficial to observe these variables in our model as well.

Obviously, there is an inverse correlation between “Happiness Rank” and all the other numerical variables. In other words, the lower the happiness rank, the higher the happiness score, and the higher the other seven factors that contribute to happiness.

In [None]:
# VISUALIZE CORRELATIONS WITH HEATMAPS
fig, ax = plt.subplots(ncols=3,figsize=(20,5) )
sns.heatmap(kendall_corr, vmin=-1, vmax=1, ax=ax[0], center=0, cmap="RdBu_r", annot=True);
sns.heatmap(spearman_corr, vmin=-1, vmax=1, ax=ax[1], center=0, cmap="RdBu_r", annot=True);
sns.heatmap(pearson_corr, vmin=-1, vmax=1, ax=ax[2], center=0, cmap="RdBu_r", annot=True);

It looks like GDP, Family, and Life Expectancy are strongly correlated with the Happiness score. Freedom correlates quite well with the Happiness score, however, Freedom correlates quite well with all data. Government Trust still has a mediocre correlation with the Happiness score.

Moving on, now that we have a bit of an understanding of the relationship between variables, we can start to use SkLearn to construct a model. First, we drop any categorical variables, and the happiness rank as that is not something we are exploring in this report. (That being said, we can create dummy variables to look at relationships for countries).

### Prediction

I will implement several machine learning algorithms to predict happiness score. First, we should split our dataset into training and test set. Our dependent variable is happiness score, and the independent variables are family, economy, life expectancy, trust, freedom, generosity, and dystopia residual.

In [None]:
df_model = df_clean.drop(['region'], axis = 1)
df_model

In [None]:
# TRAIN TEST SPLIT
from sklearn.model_selection import train_test_split

X = df_model.drop('score', axis =1)
y = df_model.score.values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)

### Multiple Linear Regression

A simple linear regression is a function that allows someone to make predictions about one variable based on the information that is known about another variable. Linear regression can only be used when one has two continuous variables—an independent variable and a dependent variable. The independent variable is the parameter that is used to calculate the dependent variable or outcome. A multiple regression model extends to several explanatory variables.

In [None]:
# MULTIPLE LR
import statsmodels.api as sm

X_sm = X = sm.add_constant(X)
model = sm.OLS(y,X_sm)
model.fit().summary()

### Linear Regression

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model.

In [None]:
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import cross_val_score

lm = LinearRegression()
lm.fit(X_train, y_train)

print('Negative MAE: ', np.mean(cross_val_score(lm,X_train,y_train, scoring = 'neg_mean_absolute_error', cv= 3)))

In [None]:
print("Estimated Intercept (constant) is", lm.intercept_)
print("The number of coefficients in this model are", lm.coef_)

In [None]:
coef = zip(X.columns, lm.coef_)
coef_df = pd.DataFrame(list(zip(X.columns, lm.coef_)), columns=['features', 'coefficients'])
coef_df

### The Linear Regression Model for Happiness Score 


$$ HappinessScore = 2.177063771973268 + (1.10489896 * gdp)  + (0.69528301 * support) + (0.99638497 * life_exp)  + (1.4786829 * freedom) + (0.31867999 * corruption)  + (1.16367809 * generosity) $$

### Lasso regression

It is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of muticollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

In [None]:
# LASSO REGRESSION
lm_l = Lasso()
lm_l.fit(X_train,y_train)
np.mean(cross_val_score(lm_l,X_train,y_train, scoring = 'neg_mean_absolute_error', cv= 3))

alpha = []
error = []

for i in range(1,10):
    alpha.append(i/1000)
    lml = Lasso(alpha=(i/1000))
    error.append(np.mean(cross_val_score(lml,X_train,y_train, scoring = 'neg_mean_absolute_error', cv= 3)))
    
#plt.plot(alpha,error)

err = tuple(zip(alpha,error))
df_err = pd.DataFrame(err, columns = ['alpha','error'])
df_err[df_err.error == max(df_err.error)]

### Random Forest

Is an ensemble technique capable of performing both regression and classification tasks with the use of multiple decision trees and a technique called Bootstrap Aggregation, commonly known as bagging. Bagging, in the Random Forest method, involves training each decision tree on a different data sample where sampling is done with replacement.

In [None]:
# RANDOM FOREST
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()

print ('Negative MAE: ', np.mean(cross_val_score(rf,X_train,y_train,scoring = 'neg_mean_absolute_error', cv= 3)))


## TUNE MODELS WITH GRIDSEARCHCV


GridSearchCV lets you combine an estimator with a grid search preamble to tune hyper-parameters. The method picks the optimal parameter from the grid search and uses it with the estimator selected by the user. 

In [None]:
# TUNE MODELS GRIDSEARCHCV
from sklearn.model_selection import GridSearchCV

parameters = {'n_estimators':range(10,300,10), 'criterion':('mse','mae'), 'max_features':('auto','sqrt','log2')}

gs = GridSearchCV(rf,parameters,scoring='neg_mean_absolute_error',cv=3)
gs.fit(X_train,y_train)

gs.best_score_
gs.best_estimator_

### Test Ensembles

In [None]:

# TEST ENSEMBLES
tpred_lm = lm.predict(X_test)
tpred_lml = lm_l.predict(X_test)
tpred_rf = gs.best_estimator_.predict(X_test)

from sklearn.metrics import mean_absolute_error
print('MAE Linear Regression:          ',mean_absolute_error(y_test,tpred_lm))
print('MAE Multiple Linear Regression: ',mean_absolute_error(y_test,tpred_lml))
print('MAE Random Forest Regression:   ',mean_absolute_error(y_test,tpred_rf))
print('Average MAE LM+RF:              ',mean_absolute_error(y_test,(tpred_lm+tpred_rf)/2))



### Pick Best Model

In [None]:

# PICKLE MODEL
import pickle
pickl = {'model': gs.best_estimator_}
pickle.dump( pickl, open( 'model_file' + ".p", "wb" ) )

file_name = "model_file.p"
with open(file_name, 'rb') as pickled:
    data = pickle.load(pickled)
    model = data['model']

model.predict(np.array(list(X_test.iloc[1,:])).reshape(1,-1))[0]


print('Best Estimator: ', gs.best_estimator_)
print('Best score: ', gs.best_score_)

list(X_test.iloc[1,:])



## CONCLUSION


**WHAT DOES MAKE PEOPLE HAPPY?**

Is this GDP per capita which makes you happy ?
It seems like the common criticism for "The World Happiness Report" is quite valid. A high focus on GDP and strong attention to features like as family and life expectancy.
Common wisdom says that money makes you happy up to a certain threshold. Having a good social net is important and family tends to provide that. High life expectancy and health make you worry less about how you'll survive and focus more in things that make you happy

Is this Perception of Corruption about Goverment, which make you sad?
The data about the perception of corruption is quite inconclusive because seems like everybody hates ther government. It can be a reflex of how information is difused nowadays and people are paying more attention to this subjec. 


Is this Freedom of Life Choises which makes you happy ?
As seen in the data, freedom correlates well with all other features. One of human basic needs is freedom, and that plays a central role in social process. Human development means to expand human choices, which it required to freedom concept.The idea of freedom is complex, and it must be redefined and defended by each generation. Moreover, the value of freedom can only be understood and appreciated by those who have a sense of the past and a highly developed understanding of human nature. All too often, people who live in freedom tend to ignore its fragility and take it for granted. Conversely, people who have not been raised within a long-standing tradition of freedom have trouble understanding and implementing it in their society.




