In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import plotly.express as px
import eli5
from eli5.sklearn import PermutationImportance
from lightgbm import LGBMRegressor

# Contents 
- [Description](#description)
- [Learning : if you are a newby](#if_u_newby)
- [Package Used](#packaged)
- [Reading data files](#reading)
- [2019 data](#2019)
- [2018 data](#2018)
- [2017 data](#2017)
- [2016 data](#2016)
- [2015 data](#2015)
- [Selecting the columns to analyze](#column_selection)
- [Missing Data Analysis](#missing)
- [Scatter and Line Plots and Plotly](#line)
- [Scatter plot  Happiness Score vs GDP per Capita](#score_gdp)
- [Scatter plot  Happiness Score vs Healthy Life Expectancy](#score_life)
- [Scatter plot Happiness Score vs Freedom to Life choise](#score_choise)
- [Scatter plot Happiness Score vs Generosity](#score_genero)
- [Scatter plot Happiness Score vs Corruption  Perceptions](#score_corupt)
- [Animated Scatter plot  Happiness Score vs GDP per Capita](#anim_score_gdp)
- [Animated Scatter plot  Happiness Score vs Healthy Life Expectancy](#anim_score_life)
- [Animated Scatter plot Happiness Score vs Freedom to Life choise](#anim_score_choise)
- [Animated Scatter plot Happiness Score vs Generosity](#anim_score_genero)
- [Animated Scatter plot Happiness Score vs Corruption  Perceptions](#anim_score_corupt)
- [Regression Model Fitting using LGBM Regressor and permutation importanc](#reg_lgbm)
- [Finding from Permutation Importance](#finding_from_permutation_importance)
- [Finding from All Analysis](#finding_from_all_analysis)

<a id='description'></a>
# Description 

This kernel explore and explain the <span style="color: blue;">**relationship**</span>  between <span style="color: blue;">**happiness score**</span> and other variable like <span style="color: green;">**GDP per Capita**</span>, <span style="color: green;">**Life Expectancy**</span>, <span style="color: green;">**Freedom**</span> etc.

### Dataset
In this kernel the world happiness data 

#### Context

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

#### Content
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.

#### Inspiration
What countries or regions rank the highest in overall happiness and each of the six factors contributing to happiness? How did country ranks or scores change between the 2015 and 2016 as well as the 2016 and 2017 reports? Did any country experience a significant increase or decrease in happiness?

#### What is Dystopia?

Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

#### What are the residuals?

The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2014-2016 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.2 shows the average residual for each country when the equation in Table 2.1 is applied to average 2014- 2016 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.2, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.85 on the 0 to 10 scale.

#### What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer.

If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.

#### Data Files 

- 2015.csv
- 2016.csv
- 2017.csv
- 2018.csv 
- 2019.csv

[Link to dataset](https://www.kaggle.com/unsdsn/world-happiness)


<a id='if_u_newby'></a>
# Learning : if you are a newby

- Pandas 
- Plotly scatter plot, animation etc
- Lightgbm : LightGBM regression.
- eli5 : Permutation Importance

<a id='packaged'></a>
# Package Used
- [Python Pandas](https://pypi.org/project/pandas/)
- [Plotly](https://plot.ly/)
- [Scikit-Learn](https://scikit-learn.org/)
- [LightGBM]()
- [eli5](https://pypi.org/project/eli5/)

# What makes you happy ?
<div style="width: 700px; height: 200px; overflow: hidden">
![](https://media.tenor.com/images/c85cbd47bcca9123983a539a8a25ad78/tenor.gif)

## Is this GDP per capita which makes you happy ?
<div style="width: 700px; height: 300px; overflow: hidden">
<img src="https://i.pinimg.com/originals/35/da/23/35da236b480636ec8ffee367281fe1b1.gif" width="700" height="300" />

## Is this Perception of Corruption about Goverment, which make you sad?
<div style="width: 700px; height: 400px; overflow: hidden">
<img src="https://media.tenor.com/images/50c6b91a0384dcc0c715abe9326789cd/tenor.gif" width="700" height="400" />

## Is this Freedom of Life Choises which makes you happy ?
<div style="width: 700px; height: 400px; overflow: hidden">
<img src="https://media0.giphy.com/media/OmAdpbVnAAWJO/giphy.gif" width="700" height="400" />

## Let us explore the factor of happiness.
<div style="width: 700px; height: 400px; overflow: hidden">
<img src="https://media1.giphy.com/media/1rKFURpStAa8VOiBLg/giphy.gif" width="700" height="400" />

<a id='reading'></a>
# Reading data files

In [None]:
d2015 = pd.read_csv("/kaggle/input/world-happiness/2015.csv")
d2016 = pd.read_csv("/kaggle/input/world-happiness/2016.csv")
d2017 = pd.read_csv("/kaggle/input/world-happiness/2017.csv")
d2018 = pd.read_csv("/kaggle/input/world-happiness/2018.csv")
d2019 = pd.read_csv("/kaggle/input/world-happiness/2019.csv")

<a id='2019'></a>
# 2019 data

In [None]:
coltoselect = ["rank","region","score",
                "gdp_per_capita","healthy_life_expectancy",
                "freedom_to_life_choise","generosity","corruption_perceptions"]
d2019.columns = ["rank","region","score",
                  "gdp_per_capita","social_support","healthy_life_expectancy",
                 "freedom_to_life_choise","generosity","corruption_perceptions"]
d2019.head()

<a id='2018'></a>
# 2018 data

In [None]:
d2018.columns = ["rank","region","score",
                  "gdp_per_capita","social_support","healthy_life_expectancy",
                 "freedom_to_life_choise","generosity","corruption_perceptions"]
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
d2018.head()

<a id='2017'></a>
# 2017 Data

In [None]:
d2017.drop(["Whisker.high","Whisker.low",
            "Family","Dystopia.Residual"],axis=1,inplace=True)
d2017.columns =  ["region","rank","score",
                  "gdp_per_capita","healthy_life_expectancy",
                 "freedom_to_life_choise","generosity","corruption_perceptions"]
d2017.head()

<a id='2016'></a>
# 2016 Data

In [None]:
d2016.drop(['Region','Lower Confidence Interval','Upper Confidence Interval',
            "Family",'Dystopia Residual'],axis=1,inplace=True)
d2016.columns = ["region","rank","score",
                  "gdp_per_capita","healthy_life_expectancy",
                 "freedom_to_life_choise","corruption_perceptions","generosity"]
d2016.head()

<a id='2015'></a>
# 2015 Data

In [None]:
d2015.drop(["Region",'Standard Error', 'Family', 'Dystopia Residual'],axis=1,inplace=True)
d2015.columns = ["region", "rank", "score", "gdp_per_capita",
"healthy_life_expectancy", "freedom_to_life_choise", "corruption_perceptions",
"generosity"]
d2015.head()

<a id='column_selection'></a>
# Selecting the columns to analyze

Finally we have come out with following columns in our data
- rank : Rank in Happiness 
- region : Country and region
- score : Happiness Score 
- gdp_per_capita : Gross domestic product per capita
- healthy_life_expectancy : Healthy life expectancy
- freedom_to_life_choise : Freedom to take choises in life
- generosity : Generosity
- corruption_perceptions : Perception about corruption of Goverment

In [None]:
d2015 = d2015.loc[:,coltoselect].copy()
d2016 = d2016.loc[:,coltoselect].copy()
d2017 = d2017.loc[:,coltoselect].copy()
d2018 = d2018.loc[:,coltoselect].copy()
d2019 = d2019.loc[:,coltoselect].copy()

# Adding Year column to each DataFrame

In [None]:
d2015["year"] = 2015
d2016["year"] = 2016
d2017["year"] = 2017
d2018["year"] = 2018
d2019["year"] = 2019

In [None]:
finaldf = d2015.append([d2016,d2017,d2018,d2019])
finaldf.head()

In [None]:
d2015.sort_values("gdp_per_capita",inplace=True)
d2016.sort_values("gdp_per_capita",inplace=True)
d2017.sort_values("gdp_per_capita",inplace=True)
d2018.sort_values("gdp_per_capita",inplace=True)
d2019.sort_values("gdp_per_capita",inplace=True)

Included year wise data in  DataFrame **finaldf**. 
- year

<a id='missing'></a>

# Missing Data Analysis

In order to visualize missing value, package missingno is used. First bar plot is used to get the missing value. The column which will have missing value, show small length of bar.

In [None]:
import missingno as msno
msno.bar(finaldf)
plt.show()

In [None]:
msno.matrix(finaldf)
plt.show()

In [None]:
finaldf.loc[finaldf.isnull().any(axis=1),:]

 sense of the completeness of the data can help inform decisions about how to best handle missing values
This is a representation of where data is missing in each column - any gaps in the bar are missing values. If a column is missing values for a very small number of records, then perhaps these are incomplete rows that should be discarded, or maybe we could attempt to predict the value, or simply set it to the most common/average value. On the other hand, if the column only has a value for half of all rows, then attempting to populate the missing values may just introduce a lot of noise.  

**We are having only one missing record data in column** <span style="color: blue;">**corruption_perception**</span>. <span style="color: green;">**There for we can drop that record.**</span>. We can observe easily that row number 19 has NaN value.

### Let us drop the NaN

In [None]:
finaldf.dropna(inplace=True)

<a id='line'></a>
# Scatter and Line Plots and Plotly

Line plot can be drawn using Plotly graph object Scatter object. In following cells we will draw line chart year wise between Happiness score and GDP per Capita. 


In [None]:
p15 = go.Scatter(
                    x = d2015.gdp_per_capita,
                    y = d2015.score,
                    mode = "lines",
                    name = "2015",
                    marker = dict(color = 'green'),
                    text= d2015.region)
p16 = go.Scatter(
                    x = d2016.gdp_per_capita,
                    y = d2016.score,
                    mode = "lines",
                    name = "2016",
                    marker = dict(color = 'red'),
                    text= d2016.region)

p17 = go.Scatter(
                    x = d2017.gdp_per_capita,
                    y = d2017.score,
                    mode = "lines",
                    name = "2017",
                    marker = dict(color = 'violet'),
                    text= d2017.region)

p18 = go.Scatter(
                    x = d2018.gdp_per_capita,
                    y = d2018.score,
                    mode = "lines",
                    name = "2018",
                    marker = dict(color = 'blue'),
                    text= d2018.region)

p19 = go.Scatter(
                    x = d2019.gdp_per_capita,
                    y = d2019.score,
                    mode = "lines",
                    name = "2019",
                    marker = dict(color = 'black'),
                    text= d2019.region)


data = [p15, p16, p17, p18, p19]
properties = dict(title = 'Happiness Score vs GDP per Capita',
              xaxis= dict(title= 'GDP per Capita',ticklen= 5,zeroline= False),
             yaxis= dict(title= 'Happiness Score',ticklen= 5,zeroline= False),
             )
fig = dict(data = data, layout = properties)
iplot(fig)

## <span style="color: green;">**Relationship between Happiness Score and GDP per Capita**</span>
It is very much clear that, happiness score is increasing as GDP per capita is increasing. But figure above is very clumsy.  

<a id='score_gdp'></a>
## <span style="color: green;">**Scatter plot where Happiness Score is on Y axis and Year for different Facet**</span>

Scatter plot can be drawn easily using plotly express. Plotly scatter function can easily draw scatter plot. Best part of scatter function is that, it can draw <span style="color: green;">**trendline**</span>. 

## <span style="color: green;">**Scatter plot  Happiness Score vs GDP per Capita**</span>

In [None]:
fig = px.scatter(finaldf, x="gdp_per_capita", 
                 y="score",
                 facet_row="year",
                color="year",
                trendline= "ols")
fig.update(layout_coloraxis_showscale=False)
fig.update_traces(textposition='top center')
fig.update_layout(
    height=800,
    title_text='GDP per capita and Happiness Score'
)
fig.show()

## <span style="color: violet;">**Finding from Scatter plot  Happiness Score vs GDP per Capita**</span>

Picture above confirms that, GDP per capita and Happiness score depicts linear relation with positive correlation. Increase in GDP per capita push happines score upward. 

<a id='score_life'></a>

## <span style="color: green;">**Scatter plot :  Happiness Score vs Healthy Life Expectancy**</span>

In [None]:

fig = px.scatter(finaldf, x="healthy_life_expectancy", 
                 y="score",
                 facet_row="year",
                color="year",
                trendline= "ols")
fig.update(layout_coloraxis_showscale=False)
fig.update_traces(textposition='top center')
fig.update_layout(
    height=800,
    title_text='Healthy Life Expecancy and Happiness Score'
)
fig.show()



## <span style="color: violet;">**Finding from Scatter plot  Happiness Score vs Healthy Life Expecancy**</span>

Picture above confirms that, Healthy Life Expecancy and Happiness score depicts linear relation with positive correlation. Increase in Healthy Life Expecancy push happines score upward. 

<a id='score_choise'></a>

## <span style="color: green;">**Scatter plot :  Happiness Score vs Freedom to Life choise**</span>

In [None]:
fig = px.scatter(finaldf, x="freedom_to_life_choise", 
                 y="score",
                 facet_row="year",
                color="year",
                trendline= "ols")
fig.update(layout_coloraxis_showscale=False)
fig.update_traces(textposition='top center')
fig.update_layout(
    height=800,
    title_text='Freedom to Life Choises and Happiness Score'
)
fig.show()


## <span style="color: violet;">**Finding from Scatter plot  Happiness Score vs Freedom to Life Choise**</span>

Picture above confirms that, Freedom to Life Choise and Happiness score depicts linear relation with positive correlation. Increase in Freedom to Life Choise leads happines score upward. 

<a id='score_genero'></a>

## <span style="color: green;">**Scatter plot :  Happiness Score vs Generosity**</span>

In [None]:
fig = px.scatter(finaldf, x="generosity", 
                 y="score",
                 facet_row="year",
                color="year",
                trendline= "ols")
fig.update(layout_coloraxis_showscale=False)
fig.update_traces(textposition='top center')
fig.update_layout(
    height=800,
    title_text='Generosity and Happiness Score'
)
fig.show()


## <span style="color: violet;">**Finding from Scatter plot  Happiness Score vs Generosity**</span>

Picture above confirms that, Generosity and Happiness score depicts linear relation with positive correlation. Increase in Generosity results in happines score upward. 

<a id='score_corupt'></a>

## <span style="color: green;">**Scatter plot :  Happiness Score vs Corruption  Perceptions** </span>

In [None]:
fig = px.scatter(finaldf, x="corruption_perceptions", 
                 y="score",
                 facet_row="year",
                color="year",
                trendline= "ols")
fig.update(layout_coloraxis_showscale=False)
fig.update_traces(textposition='top center')
fig.update_layout(
    height=800,
    title_text='Perception about corruption of Goverment and Happiness Score'
)
fig.show()


## <span style="color: violet;">**Finding from Scatter plot  Happiness Score vs Corruption  Perceptions**</span>

Picture above confirms that, Corruption  Perceptions and Happiness score depicts linear relation with positive correlation. Increase in Corruption  Perceptions results in happines score upward. 

In [None]:
finaldf.head()

<a id='anim_score_gdp'></a>

## <span style="color: green;">**Animated Scatter plot  Happiness Score vs GDP per Capita**</span>

In [None]:
px.scatter(finaldf, x="gdp_per_capita", y="score", animation_frame="year",
           animation_group="region",
           size="rank", color="region", hover_name="region",
          trendline= "ols")

<a id='anim_score_life'></a>

## <span style="color: green;">**Animated Scatter plot :  Happiness Score vs Healthy Life Expectancy**</span>


In [None]:
px.scatter(finaldf, x="healthy_life_expectancy", y="score", animation_frame="year",
           animation_group="region",
           size="rank", color="region", hover_name="region")

<a id='anim_score_choise'></a>
## <span style="color: green;">**Animated  Scatter plot :  Happiness Score vs Freedom to Life choise**</span>

In [None]:
px.scatter(finaldf, x="freedom_to_life_choise", y="score", animation_frame="year",
           animation_group="region",
           size="rank", color="region", hover_name="region")

<a id='anim_score_genero'></a>

## <span style="color: green;">**Animated  Scatter plot :  Happiness Score vs Generosity**</span>

In [None]:
px.scatter(finaldf, x="generosity", y="score", animation_frame="year",
           animation_group="region",
           size="rank", color="region", hover_name="region")

<a id='anim_score_corupt'></a>

## <span style="color: green;">**Animated  Scatter plot :  Happiness Score vs Corruption  Perceptions** </span>

In [None]:
px.scatter(finaldf, x="corruption_perceptions", y="score", animation_frame="year",
           animation_group="region",
           size="rank", color="region", hover_name="region")

<a id='reg_lgbm'></a>

# Regression Model Fitting using LGBM Regressor and permutation importance.

Following section will find out the permutation importance of independent variables

In [None]:
lgbm = LGBMRegressor(n_estimators=5000)
indData = finaldf.loc[:,"gdp_per_capita":"year"]
depData = finaldf.pop("score")
lgbm.fit(indData, depData)
columns = indData.columns.to_list()
perm = PermutationImportance(lgbm, random_state=10).fit(indData, depData)
eli5.show_weights(perm, feature_names = columns)

<a id='finding_from_permutation_importance'></a>
## <span style="color: violet;">**Finding from Permutation Importance**</span>

- GDP per capita is having highest impact on Happiness Score 
- Perception of Goverment Corruption is having least impact on Happiness Score.



<a id='finding_from_all_analysis'></a>
# <span style="color: Blue;">**Finding from All Analysis**</span>

- GDP per capita is having highest impact on Happiness Score 
- Healthy life expectancy is second in the list to impact happiness.
- Freedom to take choises in life comes after Healthy Life Expectancy to impact Happiness Score.
- Perception of Goverment Corruption is having least impact on Happiness Score.