### How Human Freedom affects World Happiness?

Ruut Veenhoven studied the relationship between freedoms and happiness in 46 nations in the early 1990's and showed in his research that freedom does not always breed happiness, and suggested that economic freedom deserves priority 
(link to a paper: https://personal.eur.nl/veenhoven/Pub2000s/2000a-full.pdf)
This project aims to utalize the available data in recent years and explore the relationship between world happiness and human freedom using Machine Learning and Vizualisation

Datasets are : 
    1- Human Freedom Dataset
    2- World Happiness Dataset
The project is divided into sections:

    1- EDA (Exploratory Data Analysis)
        1.a: Human Freedom Dataset
            1.a.1 Data Exploratory
                1.a.1.1 box plot of dataset distribution
            1.a.2 Preprocssing
            1.a.3 Target Feature Extraction
        1.b World Happiness Dataset
            1.b.1 Data Exploratory
                1.b.1.1 box plot of dataset distribution
            1.b.2 Preprocssing
            1.b.3 Target Feature Extraction
        
     2- Data Vizualization
        2.a Scatter plot:How Generosity affects World Happiness Score?
        2.b bubble plot:Does Health threaten Happiness?
        2.c Radar plot: How different scores of freedoms affects world happiness?
        2.d Heatmap: How world happiness factors and human freedom factors correlate?

In [1]:
#import packages and scripts
from IPython.core.display import HTML
from string import Template
import pandas as pd
import os
import json, random
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')
#get the d3 host remotely
HTML('<script src=https://d3js.org/d3.v4.min.js></script>')


In [2]:
HTML('''<h1>Data Viz</h1>''')

In [3]:
os.getcwd()


'/Users/skydiver/Desktop/viz'

### Data Exploratory

#### Human Freedom Dataset

The Human Freedom Index measures economic freedoms such as the freedom to trade or to use sound money, and it captures the degree to which people are free to enjoy the major freedoms often referred to as civil liberties—freedom of speech, religion, association, and assembly— in the countries in the survey. In addition, it includes indicators on rule of law, crime and violence, freedom of movement, and legal discrimination against same-sex relationships. We also include nine variables pertaining to women-specific freedoms that are found in various categories of the index.[1]
[1]: Ian Vásquez and Tanja Porčnik, The Human Freedom Index 2018: A Global Measurement of Personal, Civil, and Economic Freedom (Washington: Cato Institute, Fraser Institute, and the Friedrich Naumann Foundation for Freedom, 2018).

In [4]:
#call data
hmn_frdm= pd.read_csv("new_data/hf.csv")

In [5]:
#call data
hmn_frdm.columns

Index(['Year', 'ISO_Code', 'Countries', 'PERSONAL FREEDOM (Score)',
       'ECONOMIC FREEDOM (Score)', 'HUMAN FREEDOM (Score)',
       'HUMAN FREEDOM (Rank)', 'HUMAN FREEDOM (Quartile)'],
      dtype='object')

In [6]:
hmn_frdm.rename(columns = {'Year':'year','Countries': 'Country', 'PERSONAL FREEDOM (Score)':'pf_score',
       'ECONOMIC FREEDOM (Score)':'ef_score', 'HUMAN FREEDOM (Score)':'hf_score',
       'HUMAN FREEDOM (Rank)':'hf_rank', 'HUMAN FREEDOM (Quartile)':'HF_quartile'}, 
                                 inplace = True)
hmn_frdm.head()

Unnamed: 0,year,ISO_Code,Country,pf_score,ef_score,hf_score,hf_rank,HF_quartile
0,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0
1,2015,DZA,Algeria,5.25,4.84,5.05,153.0,4.0
2,2015,AGO,Angola,5.72,5.4,5.56,142.0,4.0
3,2015,ARG,Argentina,8.05,4.88,6.46,108.0,3.0
4,2015,ARM,Armenia,7.11,7.6,7.36,54.0,2.0


In [7]:
hmn_frdm.head()

Unnamed: 0,year,ISO_Code,Country,pf_score,ef_score,hf_score,hf_rank,HF_quartile
0,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0
1,2015,DZA,Algeria,5.25,4.84,5.05,153.0,4.0
2,2015,AGO,Angola,5.72,5.4,5.56,142.0,4.0
3,2015,ARG,Argentina,8.05,4.88,6.46,108.0,3.0
4,2015,ARM,Armenia,7.11,7.6,7.36,54.0,2.0


In [8]:
hmn_frdm.shape

(1272, 8)

In [9]:
hmn_frdm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1272 entries, 0 to 1271
Data columns (total 8 columns):
year           1272 non-null int64
ISO_Code       1272 non-null object
Country        1272 non-null object
pf_score       1216 non-null float64
ef_score       1216 non-null float64
hf_score       1216 non-null float64
hf_rank        1216 non-null float64
HF_quartile    1216 non-null float64
dtypes: float64(5), int64(1), object(2)
memory usage: 79.6+ KB


There are many columns in the dataset that prevented the function above to fetch it all

In [10]:
hmn_frdm.describe()

Unnamed: 0,year,pf_score,ef_score,hf_score,hf_rank,HF_quartile
count,1272.0,1216.0,1216.0,1216.0,1216.0,1216.0
mean,2011.5,7.211628,6.776826,6.994128,76.410362,2.5
std,2.292189,1.365294,0.882497,1.020004,44.140519,1.117758
min,2008.0,2.86,2.92,4.04,1.0,1.0
25%,2009.75,6.16,6.24,6.3475,38.75,1.75
50%,2011.5,7.2,6.89,6.93,75.5,3.0
75%,2013.25,8.47,7.3925,7.9125,114.0,3.0
max,2015.0,9.58,9.19,9.14,159.0,4.0


from above the dataset covers data from year 2008 to 2015  

In [11]:
df_num = hmn_frdm.select_dtypes(include = ['float64', 'int64'])
df_num.columns

Index(['year', 'pf_score', 'ef_score', 'hf_score', 'hf_rank', 'HF_quartile'], dtype='object')

In [60]:
from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML


init_notebook_mode(connected=True)
import plotly.graph_objs as go

#select only numerical data from the data frame and check the distribution
df_num = hmn_frdm.select_dtypes(include = ['float64', 'int64'])


x_data = df_num[['pf_score', 'ef_score', 'hf_score', 'HF_quartile']].columns
y_data = df_num[['pf_score', 'ef_score', 'hf_score', 'HF_quartile']].values

colors = ['rgba(83, 134, 234, 0.5)', 'rgba(215, 114, 14, 0.5)', 'rgba(14, 110, 101, 0.5)', 'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)', 'rgba(127, 96, 0, 0.5)','rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)', 'rgba(44, 160, 101, 0.5)', 'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)']

traces = []

for xd, yd, cls in zip(x_data, y_data, colors):
        traces.append(go.Box(
            y=yd,
            name=xd,
            boxpoints='all',
            jitter=0.5,
            whiskerwidth=0.2,
            fillcolor=cls,
            marker=dict(
                size=5,
            ),
            line=dict(width=1),
        ))

layout = go.Layout(
    title='Distribution of Numerical data in the Human Freedom dataset',
    yaxis=dict(
        autorange=True,
        showgrid=True,
        zeroline=True,
        dtick=8,
        gridcolor='rgb(255, 255, 255)',
        gridwidth=1,
        zerolinecolor='rgb(255, 255, 255)',
        zerolinewidth=2,
    ),
    margin=dict(
        l=40,
        r=80,
        b=120,
        t=100,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
    showlegend=False
)

fig = go.Figure(data=traces, layout=layout)
iplot(fig)

##### Preprocessing

In [13]:
hmn_frdm.columns

Index(['year', 'ISO_Code', 'Country', 'pf_score', 'ef_score', 'hf_score',
       'hf_rank', 'HF_quartile'],
      dtype='object')

In [14]:
#check the percentatage of NAs in every column
na = hmn_frdm.isnull().sum()/len(hmn_frdm)
#filter columns with NA above 0.3 of the data
na_3 = na[na>0.3]
#delete columns with NA above 0.3 of the data
for ind in na_3.index:
    del hmn_frdm[ind]
    #double check if NAs were removed
na_after = hmn_frdm.isnull().sum()/len(hmn_frdm)
#check if there are still columns with NA above 0.3 of the data
na_after_3 = na_after[na_after>0.3]
na_after_3.index

Index([], dtype='object')

In [15]:
hmn_frdm.shape

(1272, 8)

In [16]:
(hmn_frdm.mean())

year           2011.500000
pf_score          7.211628
ef_score          6.776826
hf_score          6.994128
hf_rank          76.410362
HF_quartile       2.500000
dtype: float64

the number of columns shrinked to 106 columns, I will now fill the NAs of the rest of the columns with less than 0.3 NAs

In [17]:
#fill NA with mean value
hf_df_new = hmn_frdm.fillna(hmn_frdm.mean(),axis=0)
hf_df_new.isnull().sum()

year           0
ISO_Code       0
Country        0
pf_score       0
ef_score       0
hf_score       0
hf_rank        0
HF_quartile    0
dtype: int64

In [18]:
hf_df_new.head()

Unnamed: 0,year,ISO_Code,Country,pf_score,ef_score,hf_score,hf_rank,HF_quartile
0,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0
1,2015,DZA,Algeria,5.25,4.84,5.05,153.0,4.0
2,2015,AGO,Angola,5.72,5.4,5.56,142.0,4.0
3,2015,ARG,Argentina,8.05,4.88,6.46,108.0,3.0
4,2015,ARM,Armenia,7.11,7.6,7.36,54.0,2.0


next I will divide class to high, medium and low

#### Feature Extraction

In [19]:
import numpy as np
hf_df_new['class_high'] = np.where(hf_df_new['hf_score']>7 , "High",0)
hf_df_new['class_low'] = np.where(hf_df_new['hf_score']<5.5 , "Low",0)
hf_df_new['HF_class'] = hf_df_new['class_high'].str.cat(hf_df_new['class_low'], sep=' - ')
hf_df_new.head(10)
hf_df_new = hf_df_new.replace(['0 - 0', 'High - 0','0 - Low'], 
                     ['Med','High','Low']) 
hf_df_new['HF_class'].value_counts()

Med     607
High    562
Low     103
Name: HF_class, dtype: int64

In [20]:
#delete redundant columns
del hf_df_new['class_high']
del hf_df_new['class_low']
hf_df_new.head()

Unnamed: 0,year,ISO_Code,Country,pf_score,ef_score,hf_score,hf_rank,HF_quartile,HF_class
0,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0,High
1,2015,DZA,Algeria,5.25,4.84,5.05,153.0,4.0,Low
2,2015,AGO,Angola,5.72,5.4,5.56,142.0,4.0,Med
3,2015,ARG,Argentina,8.05,4.88,6.46,108.0,3.0,Med
4,2015,ARM,Armenia,7.11,7.6,7.36,54.0,2.0,High


#### world happiness Dataset

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll.

In [21]:
#data until year 2015
wrld_hpns_2015=pd.read_csv("https://storage.googleapis.com/kaggle-datasets/894/2508/2015.csv?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1553104496&Signature=mW%2BX79jJHqVIeRa%2F6yu0ozbg9Iu5lLz4GXcDmEzzjot4ROQHmpMSMopcTEZCDaUUCSVB5LwLbblHRknTc3eXnMImhxWz2a2e7wgKTMFhmWnKBG8mpwdMhzlT9vAmv%2F3FoFrDvm8l2EN83eLOHPFXwgnq83woc6heutZdQWPGpcBvev2VbT6msH7v%2Fyh7qVNz7Cn37rRan3Z9nWzO9gUgVLJ9hKWwzQ9pbe73KwjHeZpjI1ASei9qznzFeyZN1xQ0IU5YVp3eI32JKG%2BlK4GKlmqbR1OBG%2Fmpu4CcjajnKGEud4UwWGgzpPPAWcsKprg4DQIB9AQWK6tiuGtv1pi1Nw%3D%3D")
del wrld_hpns_2015['Standard Error']
wrld_hpns_2015['year']='2015'
wrld_hpns_2015.head(2)

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,year
0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015
1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015


In [22]:
#data until year 2016
wrld_hpns_2016=pd.read_csv("https://storage.googleapis.com/kaggle-datasets/894/2508/2016.csv?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1553105539&Signature=gl7bYWXtkkjUKvuEupXmqD5t4P58XWXWRKALGR5GC2vRtLa1QOFtWQ6%2FBrE6JmNGrz%2Bv1qTnSIqFYGQW0jDPFid%2BoiD3tDY6pz2ghDL%2B4SsNU9Twlgtf3MYn76RGapPqQt312%2FIVxt0fJd1P%2BVOC6ga9tPoh4h2x3C9JxgOP8J6VYuAMi9SWqRHyBafd8C8DNKeW4upS5REHwgAQhMstCKT1jJH6d7qmvDqQcX070AETyQClz4LJSbRIW11OEZxWrbwZWfG5vPh9q4wTXg5Rv9NYorQi9ko0EZwQsCIlUGlEp7Tz%2BuUTUmjCsMVGVTx8S1cvcgLzo03yZEmwWpQX2Q%3D%3D")
del wrld_hpns_2016['Lower Confidence Interval']
del wrld_hpns_2016['Upper Confidence Interval']
wrld_hpns_2016['year']='2016'
wrld_hpns_2016.head(2)

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,year
0,Denmark,Western Europe,1,7.526,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939,2016
1,Switzerland,Western Europe,2,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463,2016


In [23]:
#after importing the 2 datasets of different years I will now concatenate them together
wrld_hpns = wrld_hpns_2015.append(wrld_hpns_2016)
wrld_hpns.head(2)

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,year
0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015
1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015


In [24]:
wrld_hpns.columns

Index(['Country', 'Region', 'Happiness Rank', 'Happiness Score',
       'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
       'Freedom', 'Trust (Government Corruption)', 'Generosity',
       'Dystopia Residual', 'year'],
      dtype='object')

In [25]:
wrld_hpns.shape

(315, 12)

In [26]:
wrld_hpns.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 315 entries, 0 to 156
Data columns (total 12 columns):
Country                          315 non-null object
Region                           315 non-null object
Happiness Rank                   315 non-null int64
Happiness Score                  315 non-null float64
Economy (GDP per Capita)         315 non-null float64
Family                           315 non-null float64
Health (Life Expectancy)         315 non-null float64
Freedom                          315 non-null float64
Trust (Government Corruption)    315 non-null float64
Generosity                       315 non-null float64
Dystopia Residual                315 non-null float64
year                             315 non-null object
dtypes: float64(8), int64(1), object(3)
memory usage: 32.0+ KB


In [27]:
wrld_hpns.describe()

Unnamed: 0,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
count,315.0,315.0,315.0,315.0,315.0,315.0,315.0,315.0,315.0
mean,79.238095,5.378949,0.899837,0.892647,0.594054,0.399896,0.140532,0.239957,2.212032
std,45.538922,1.141531,0.41078,0.286718,0.24079,0.150684,0.11549,0.130077,0.558728
min,1.0,2.839,0.0,0.0,0.0,0.0,0.0,0.0,0.32858
25%,40.0,4.51,0.5949,0.73988,0.419645,0.29589,0.061315,0.15143,1.884135
50%,79.0,5.286,0.97306,0.93793,0.64045,0.41319,0.10613,0.21854,2.21126
75%,118.5,6.269,1.229,1.08736,0.78764,0.51673,0.17861,0.31145,2.56347
max,158.0,7.587,1.82427,1.40223,1.02525,0.66973,0.55191,0.81971,3.83772


In [28]:
wrld_hpns.rename(columns = {"Happiness Rank": "HappinessRank",'Happiness Score':"HappinessScore",
       'Economy (GDP per Capita)':"Economy",'Health (Life Expectancy)':"health",
       'Trust (Government Corruption)':"Trust"}, 
                                 inplace = True)

In [29]:
wrld_hpns.head()

Unnamed: 0,Country,Region,HappinessRank,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year
0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015
1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015
2,Denmark,Western Europe,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015
3,Norway,Western Europe,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015
4,Canada,North America,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2015


In [30]:
wrld_hpns.columns

Index(['Country', 'Region', 'HappinessRank', 'HappinessScore', 'Economy',
       'Family', 'health', 'Freedom', 'Trust', 'Generosity',
       'Dystopia Residual', 'year'],
      dtype='object')

In [31]:
wrold_hpns_viz= wrld_hpns[['HappinessScore', 'Economy',
       'Family', 'health', 'Freedom', 'Trust', 'Generosity',
       'Dystopia Residual']]
wrold_hpns_viz.head()


Unnamed: 0,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual
0,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


In [32]:
from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML


init_notebook_mode(connected=True)
import plotly.graph_objs as go

#select only numerical data from the data frame and check the distribution
wh_num = wrold_hpns_viz.select_dtypes(include = ['float64', 'int64'])

x_data = wrold_hpns_viz.columns
y_data = wrold_hpns_viz.values

colors = ['rgba(83, 134, 234, 0.5)', 'rgba(215, 114, 14, 0.5)', 'rgba(14, 110, 101, 0.5)', 'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)', 'rgba(127, 96, 0, 0.5)','rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)', 'rgba(44, 160, 101, 0.5)', 'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)']

traces = []

for xd, yd, cls in zip(x_data, y_data, colors):
        traces.append(go.Box(
            y=yd,
            name=xd,
            boxpoints='all',
            jitter=0.5,
            whiskerwidth=0.2,
            fillcolor=cls,
            marker=dict(
                size=5,
            ),
            line=dict(width=1),
        ))

layout = go.Layout(
    title='Distribution of Numerical data in the World happiness dataset',
    yaxis=dict(
        autorange=True,
        showgrid=True,
        zeroline=True,
        dtick=8,
        gridcolor='rgb(255, 255, 255)',
        gridwidth=1,
        zerolinecolor='rgb(255, 255, 255)',
        zerolinewidth=2,
    ),
    margin=dict(
        l=40,
        r=80,
        b=120,
        t=100,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
    showlegend=False
)

fig = go.Figure(data=traces, layout=layout)
iplot(fig)

##### Preprocssing

In [33]:
#check the percentatage of NAs in every column
na_wh = wrld_hpns.isnull().sum()/len(wrld_hpns)
#filter columns with NA above 0.3 of the data
na_wh_3 = na_wh[na_wh>0.3]
#delete columns with NA above 0.3 of the data
for ind in na_wh_3.index:
    del wh_num[ind]
    #double check if NAs were removed
wh_na_after = wrld_hpns.isnull().sum()/len(wrld_hpns)
#check if there are still columns with NA above 0.3 of the data
wh_na_after_3 = wh_na_after[wh_na_after>0.3]
wh_na_after_3.index

Index([], dtype='object')

In [34]:
wrld_hpns.shape

(315, 12)

In [35]:
(wh_num.mean())

HappinessScore       5.378949
Economy              0.899837
Family               0.892647
health               0.594054
Freedom              0.399896
Trust                0.140532
Generosity           0.239957
Dystopia Residual    2.212032
dtype: float64

In [36]:
#fill NA with mean value
wh_df_new = wrld_hpns.fillna(wrld_hpns.mean(),axis=0)
wh_df_new.isnull().sum()

Country              0
Region               0
HappinessRank        0
HappinessScore       0
Economy              0
Family               0
health               0
Freedom              0
Trust                0
Generosity           0
Dystopia Residual    0
year                 0
dtype: int64

In [37]:
wh_df_new.head()

Unnamed: 0,Country,Region,HappinessRank,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year
0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015
1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015
2,Denmark,Western Europe,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015
3,Norway,Western Europe,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015
4,Canada,North America,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2015


#### Feature Extraction

In [38]:
import numpy as np
wh_df_new['class_high'] = np.where(wh_df_new['HappinessScore']>6.5 , "High",0)
wh_df_new['class_low'] = np.where(wh_df_new['HappinessScore']<5 , "Low",0)
wh_df_new['WH_class'] = wh_df_new['class_high'].str.cat(wh_df_new['class_low'], sep=' - ')
wh_df_new = wh_df_new.replace(['0 - 0', 'High - 0','0 - Low'], 
                     ['Med','High','Low']) 
wh_df_new['WH_class'].value_counts()

Med     132
Low     123
High     60
Name: WH_class, dtype: int64

In [39]:
#delete redundant columns
del wh_df_new['class_high']
del wh_df_new['class_low']

In [40]:
wh_df_new.head()

Unnamed: 0,Country,Region,HappinessRank,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year,WH_class
0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015,High
1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015,High
2,Denmark,Western Europe,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015,High
3,Norway,Western Europe,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015,High
4,Canada,North America,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2015,High


In [41]:
HTML('''<h1>Export data in a dictionary form</h1>''')

In [42]:
import json

# D3 accepts dictionaries ,here I will convert data frame into dictionaries.
HF_array_of_dicts = hf_df_new.to_dict(orient='records')
WH_array_of_dicts = wh_df_new.to_dict(orient='records')
with open('HF_array_of_dicts.txt', 'w') as file:
     file.write(json.dumps(HF_array_of_dicts)) # use `json.loads` to do the reverse
with open('WH_array_of_dicts.txt', 'w') as file:
     file.write(json.dumps(WH_array_of_dicts)) # use `json.loads` to do the reverse


In [43]:
hf_df_new.to_csv('hf_df_new.csv',sep=',')
wh_df_new.to_csv('wh_df_new.csv',sep=',')

### Visualization

In [44]:
HTML('''<h1>How Generosity affects World Happiness Score?</h1>''')

In [45]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
import numpy as np

wh_df_new=pd.read_csv('wh_df_new.csv')
wh_df_new.head()


Unnamed: 0.1,Unnamed: 0,Country,Region,HappinessRank,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year,WH_class
0,0,Switzerland,Western Europe,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015,High
1,1,Iceland,Western Europe,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015,High
2,2,Denmark,Western Europe,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015,High
3,3,Norway,Western Europe,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015,High
4,4,Canada,North America,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2015,High


In [46]:
HappinessScore = wh_df_new['HappinessScore']
Generosity = wh_df_new.Generosity

In [47]:
trace = go.Scatter(
    x = HappinessScore,
    y = Generosity,
    mode = 'markers',
    marker = dict(
        color = '#FFBAD2',
        line = dict(width = 1)
    )
)


In [48]:
data = go.Data([trace])
layout=go.Layout(title="How Generosity affects World Happiness Score?", xaxis={'title':'Happiness Score'}, yaxis={'title':'Generosity'})
figure=go.Figure(data=data,layout=layout)
iplot(figure)




plotly.graph_objs.Data is deprecated.
Please replace it with a list or tuple of instances of the following types
  - plotly.graph_objs.Scatter
  - plotly.graph_objs.Bar
  - plotly.graph_objs.Area
  - plotly.graph_objs.Histogram
  - etc.




In [49]:
dataset = wh_df_new

In [50]:
#for viz purpose only (make sure that the numbers are not above 1M, 1H, 1H)
dataset['HappinessScore'] = dataset['HappinessScore']*1000000
dataset['health']=dataset['health']*100
dataset.head()

Unnamed: 0.1,Unnamed: 0,Country,Region,HappinessRank,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year,WH_class
0,0,Switzerland,Western Europe,1,7587000.0,1.39651,1.34951,94.143,0.66557,0.41978,0.29678,2.51738,2015,High
1,1,Iceland,Western Europe,2,7561000.0,1.30232,1.40223,94.784,0.62877,0.14145,0.4363,2.70201,2015,High
2,2,Denmark,Western Europe,3,7527000.0,1.32548,1.36058,87.464,0.64938,0.48357,0.34139,2.49204,2015,High
3,3,Norway,Western Europe,4,7522000.0,1.459,1.33095,88.521,0.66973,0.36503,0.34699,2.46531,2015,High
4,4,Canada,North America,5,7427000.0,1.32629,1.32261,90.563,0.63297,0.32957,0.45811,2.45176,2015,High


In [51]:
del dataset['Unnamed: 0']

In [52]:
HTML('''<h1>Does Health threaten Happiness??</h1>''')

In [53]:
from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML
init_notebook_mode(connected=True)


years = ['2015','2016']
# make list of Regions
Regions = []
for Region in dataset['Region']:
    if Region not in Regions:
        Regions.append(Region)
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['title'] = 'Does Happiness make people healthier?'
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Health'}
figure['layout']['yaxis'] = {'title': 'Happiness Score', 'type': 'log'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 500,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '2002',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}
figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]

sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

# make data
year = 1952
for Region in Regions:
    dataset_by_year = dataset[dataset['year'] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year['Region'] == Region]

    data_dict = {
        'x': list(dataset_by_year_and_cont['health']),
        'y': list(dataset_by_year_and_cont['HappinessScore']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_cont['Country']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'size': list(dataset_by_year_and_cont['HappinessScore'])
        },
        'name': Region
    }
    figure['data'].append(data_dict)
    
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for Region in Regions:
        dataset_by_year = dataset[dataset['year'] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['Region'] == Region]

        data_dict = {
            'x': list(dataset_by_year_and_cont['health']),
            'y': list(dataset_by_year_and_cont['HappinessScore']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_cont['Country']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'size': list(dataset_by_year_and_cont['HappinessScore'])
            },
            'name': Region
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)

    
figure['layout']['sliders'] = [sliders_dict]

iplot(figure)


## Merge 2 datasets

In [54]:
#merged_dataset= pd.merge(hf_df_new,wh_df_new, on='Country')

merged_dataset = pd.merge(hf_df_new,wh_df_new, on=['Country'])


merged_dataset.head()

Unnamed: 0,year_x,ISO_Code,Country,pf_score,ef_score,hf_score,hf_rank,HF_quartile,HF_class,Region,...,HappinessScore,Economy,Family,health,Freedom,Trust,Generosity,Dystopia Residual,year_y,WH_class
0,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0,High,Central and Eastern Europe,...,4959000.0,0.87867,0.80434,81.325,0.35733,0.06413,0.14272,1.89894,2015,Low
1,2015,ALB,Albania,7.78,7.54,7.66,47.0,2.0,High,Central and Eastern Europe,...,4655000.0,0.9553,0.50163,73.007,0.31866,0.05301,0.1684,1.92816,2016,Low
2,2014,ALB,Albania,7.61,7.48,7.54,50.0,2.0,High,Central and Eastern Europe,...,4959000.0,0.87867,0.80434,81.325,0.35733,0.06413,0.14272,1.89894,2015,Low
3,2014,ALB,Albania,7.61,7.48,7.54,50.0,2.0,High,Central and Eastern Europe,...,4655000.0,0.9553,0.50163,73.007,0.31866,0.05301,0.1684,1.92816,2016,Low
4,2013,ALB,Albania,7.43,7.24,7.34,55.0,2.0,High,Central and Eastern Europe,...,4959000.0,0.87867,0.80434,81.325,0.35733,0.06413,0.14272,1.89894,2015,Low


In [55]:
merged_dataset.head()
merged_dataset.columns

Index(['year_x', 'ISO_Code', 'Country', 'pf_score', 'ef_score', 'hf_score',
       'hf_rank', 'HF_quartile', 'HF_class', 'Region', 'HappinessRank',
       'HappinessScore', 'Economy', 'Family', 'health', 'Freedom', 'Trust',
       'Generosity', 'Dystopia Residual', 'year_y', 'WH_class'],
      dtype='object')

In [56]:
#Here I will try to check how happy countries differs than less happier countries in hf, pf expression and ef score
#extract total sum of the scores for viz purposes 
#High
High_hf_score_total = int(merged_dataset[merged_dataset['WH_class']=="High"]['hf_score'].sum())
High_pf_expression_total = int(merged_dataset[merged_dataset['WH_class']=="High"]['pf_score'].sum())
High_ef_score_total = int(merged_dataset[merged_dataset['WH_class']=="High"]['ef_score'].sum())
#Med
Med_hf_score_total = merged_dataset[merged_dataset['WH_class']=="Med"]['hf_score'].sum()
Med_pf_expression_total = merged_dataset[merged_dataset['WH_class']=="Med"]['pf_score'].sum()
Med_ef_score_total = merged_dataset[merged_dataset['WH_class']=="Med"]['ef_score'].sum()
#Low
Low_hf_score_total = merged_dataset[merged_dataset['WH_class']=="Low"]['hf_score'].sum()
Low_pf_expression_total = merged_dataset[merged_dataset['WH_class']=="Low"]['pf_score'].sum()
Low_ef_score_total = merged_dataset[merged_dataset['WH_class']=="Low"]['ef_score'].sum()

In [57]:
High_hf_score_total

3767

In [58]:
from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML
init_notebook_mode(connected=True)
import plotly.graph_objs as go

data = [
    go.Scatterpolar(
      r = [Med_hf_score_total,Med_pf_expression_total,Med_ef_score_total],
  theta = ['Human Freedom Score','Personal Freedom Score','Economic Freedom Score'],
      fill = 'toself',
      name = 'Med Level of Happiness'
    ),
     go.Scatterpolar(
      r = [Low_hf_score_total,Low_pf_expression_total,Low_ef_score_total],
  theta = ['Human Freedom Score','Personal Freedom Score','Economic Freedom Score'],
      fill = 'toself',
      name = 'Low Level of Happiness'
    ),
    go.Scatterpolar(
  r = [High_hf_score_total,High_pf_expression_total,High_ef_score_total],
  theta = ['Human Freedom Score','Personal Freedom Score','Economic Freedom Score'],
  fill = 'toself' ,
              name = 'High Level of Happiness'

)


]
layout = go.Layout(
            title='How different scores of freedoms affects world happiness?',
  polar = dict(
    radialaxis = dict(
      visible = True,
              range = [0, 10000],

    )
  ),
  showlegend = True
)


fig = go.Figure(data=data, layout=layout)
iplot(fig, filename = "radar/multiple")

In [59]:

corrs=merged_dataset.corr()
data = [go.Heatmap(
  z=corrs.values,
    x=list(corrs.columns),
    y=list(corrs.index),
  xgap = 4,
  ygap = 4,
  colorscale = 'Viridis',
    text=corrs.round(2).values
)]
 
layout = go.Layout(
  title = 'How world happiness factors and human freedom factors correlate?',
  xaxis = dict(
    tickmode = 'linear'
  ),
      autosize=True,
    margin=dict(
        l=170,
        b=120,
    )
   
)
 
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='heatmap-with-padding')


-

-

# Does this say that the less freedom people have the more likely to be happy? 

-----------

-----------

-----------

-----------
Mashael Alzaid

delivered as part of Data Viz Course requirements

DCU@PNU 2019