![](https://storage.googleapis.com/kaggle-datasets-images/569763/1033439/1ce64e5c026420074b8c4d17064c04e8/dataset-cover.png?t=2020-03-25-03-01-03)

# World Happiness 2020

### As a summary what we'll see !

1. [Introduction](#1)
    1. [Predictors(Independent) Variables](#1.1)
    1. [Targets(Dependent Variables](#1.2)    
1. [Descriptive Statistics](#2)
1. [Plot Contents](#3)
    1. [Correlation Map](#3.1)
    1. [Cluster Map](#3.1.1)
    1. [Bubble Plot](#3.2)    
    1. [Box and Violin Plot](#3.3)    
    1. [Boxen Plot](#3.4)    
    1. [Pair Grid](#3.5)    
    1. [Pair Plot](#3.6)   
    1. [Choropleth Maps](#3.7) 
1. [References](#9)

<a id="1"></a> <br>
# **1. Introduction**
* The data is about the World Happines Report 2020, one of the Kaggle's data.
* The World Happiness Report is a publication of the Sustainable Development Solutions Network, powered by data from the Gallup World Poll. 
* The report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. 
* To aim of this kernels is understanding the survey in basic and happiness ranking, GDP and health life expectancy etc.

<a id="1.1"></a> <br>
## **1.1 Predictors(Independent) Variables**



* **Logged GDP per capita:** GDP per capita is in terms of Purchasing Power Parity (PPP).

* **Healthy life expectancy:** The time series of healthy life expectancy at birth are constructed based on data from the World Health Organization (WHO) Global Health Observatory data
repository

* **Social support:** is the national average of the binary responses (0=no, 1=yes) to the Gallup World Poll (GWP) question, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

* **Freedom to make life choices:** is the national average of binary responses to the GWP question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” 

* **Generosity:** is the residual of regressing the national average of GWP responses to the question, “Have you donated money to a charity in the past month?” on GDP per capita. 

* **Perceptions of corruption:** are the average of binary answers to two GWP questions: “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” 

<a id="1.2"></a> <br>
## **1.2 Targets(Dependent) Variable**


**Ladder score:** people are asked to make a general evaluation of their life on a Cantril ladder scale from 0 to 10, with the worst possible life as 0 and the best possible life as 10. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # viz 
import matplotlib.pyplot as plt #viz
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
import warnings
warnings.filterwarnings('ignore') 



# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/world-happiness-report-2020/WHR20_DataForFigure2.1.csv',index_col="Country name")

In [None]:
# At glance the data
df.head()

In [None]:
# We need target and predictors

data = df.loc[:,"Regional indicator":"Perceptions of corruption"].drop(df.loc[:,"Standard error of ladder score":"lowerwhisker"], axis=1)
data.head()

In [None]:
#check data types and missing variables

data.info()

<a id="2"></a> <br>
## Descriptive Statistics

In [None]:
# Lets see the summary statistics on numeric(float) variables

data.describe()

* What's about categorical variable?

In [None]:
# Summarize Regional indicator

reg_cnt = data["Regional indicator"].nunique()
reg_name = data["Regional indicator"].unique()
reg_cnt_val = data["Regional indicator"].value_counts()

print("Number of Regions: " + str(reg_cnt),"\n")
print("Name of Regions:\n " + str(reg_name),"\n")
print("Value Count of Regions:\n", reg_cnt_val)


<a id="3"></a> <br>
## Plot Contents

<a id="3.1"></a> <br>
### Correlation Map

In [None]:
#calculating correlation
cor = data.corr()

#heatmap as below
sns.heatmap(cor, square = True, cmap="coolwarm",annot=True,linewidths=0.5)

plt.show()

* ** Wondering, does the correlation map change among the regional?**

In [None]:
# for loop for each regions 

for i in reg_name:
    ax = plt.axes()
    corc = data[data["Regional indicator"]==i].corr()
    sns.heatmap(corc, square = True, cmap="coolwarm",annot=True,linewidths=0.5)
    ax.set_title(i)
    plt.show()

<a id="3.1.1"></a> <br>
> ### Cluster Map

"clustermap" can be used for display of both variable clustering and correlation. Distance metric is to use for hierarchical clustering.

In [None]:
sns.clustermap(data.corr(), center=0, cmap="vlag", z_score=0, linewidths=.75)

it can be also used for display of both variable and countries clustering. So, you can find countries which is similar according to input features.

In [None]:
sns.clustermap(data.select_dtypes(include="float"), center=0, cmap="vlag", z_score=0, linewidths=.75,figsize=(10,50))

<a id="3.2"></a> <br>
### **Bubble Plot**

By adding the 3rd and 4th dimensions using the circle size and the color of the regional indicator, only the 2-Dimensional basic scatter chart allows more information to be displayed in the data.

Moreover, "hover_name" that is a paramater in plotly library, is used for showing "reginol indicator" by mouse over the plot.

Plot show us a positive correlation between Ladder score and GDP and also Social Support(size of circles). Did you notice the big difference between Western Europe(blue) and Sub-Saharan Africa(pink)?

In [None]:
fig = px.scatter(data, x="Logged GDP per capita", y="Ladder score", size="Social support", color="Regional indicator", hover_name=data.index, size_max=20)

fig.show()

<a id="3.3"></a> <br>
### Box and Violin Plot

Box plot is used common for Detection of outlier and display of data distrubition. The violin plot that is not common like box plot, is used the same reasons. The violin plot gives more or less the same information about the outlier and distrubition. But showing the display of data distrubiton is more generous.


In [None]:
sns.boxplot(x=data["Logged GDP per capita"], y = data["Regional indicator"],palette = "pastel")

In [None]:
sns.violinplot(x=data["Logged GDP per capita"], y = data["Regional indicator"], scale = "width",palette="Set3")

<a id="3.4"></a> <br>
### Boxen Plot

In [None]:
sns.set(style="whitegrid")
sns.boxenplot(x=data["Logged GDP per capita"], y = data["Regional indicator"],scale="linear")

### Swarm Plot

In [None]:
sns.swarmplot(x=data["Logged GDP per capita"], y = data["Regional indicator"])

<a id="3.5"></a> <br>
### Pair Grid

Very easy showing summary of all data with Pair Grid

In [None]:
disp = sns.PairGrid(data, diag_sharey=False)
disp.map_upper(sns.scatterplot)
disp.map_lower(sns.kdeplot, colors="C0")
disp.map_diag(sns.kdeplot, lw=2)

<a id="3.6"></a> <br>
### Pair Plot

Very easy showing summary of all data with Pair Plot too.

In [None]:
sns.pairplot(data, hue = "Regional indicator")

<a id="3.7"></a> <br>
### Choropleth Maps

Building interaction map using "choropleth map". You can find happiness score on the map

In [None]:
data = pd.read_csv('/kaggle/input/world-happiness-report-2020/WHR20_DataForFigure2.1.csv')

data = dict(type = 'choropleth', 
           locations = data['Country name'],
           locationmode = 'country names',
           z = data['Ladder score'], 
           text = data['Country name'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Happiness Score 2020', 
             geo = dict(showframe = False,
                       showocean = False,
                       showlakes = True,
                       showcoastlines = True,
                       projection = {'type': 'natural earth'}))
map_ = go.Figure(data = data, layout=layout)
iplot(map_)

**I hope you find this kernel useful and enjoyable.**

**Your comments and feedback are most welcome.**

<a id="9"></a> <br> 
## **References:**

https://happiness-report.s3.amazonaws.com/2020/WHR20.pdf

https://en.wikipedia.org/wiki/World_Happiness_Report

https://seaborn.pydata.org/

https://plotly.com/python/