### ***Quantifying Happiness : World Happiness Index Analysis***

Every year since 2012 the United Nations publishes a World Happiness Report as a part of the United Nations Sustainable Development Solutions Network. The goal of the report is to provide data that can help and guide policy makers. The index is shown alongside key factors such as:
1. GDP per capita
2. Healthy life expectancy,
3. Available social support,
4. The degree of freedom to make decisions about one’s life,
5. Generosity in a society, and
6. The absence of corruption.

This kernel / notebook is an investigation of the factors listed along the index and how might they be correlated to it. Why the overall happiness of a country might vary over the years and how policy makers can use this index to make better decisions for citizens is the prime motivation behind the project.

***Definition of factors***
All of the scores generated above are obtained by the U.N using the services of Gallup World Poll - a set of nationally representative surveys undertaken in more than 160 countries in over 140 languages. 
1. ***Happiness Index*** : 
The main methodology to obtain this is the use of a ***Cantril Ladder*** to gauge public opinion on a very specific question. This question serves as perfect definitions of the factor :
“Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder  represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?”

2. ***GDP Per Capita*** :
GDP per capita is a measure of a country's economic output that accounts for its number of people. It divides the country's gross domestic product by its total population. That makes it the best measurement of a country's standard of living. It tells you how prosperous a country feels to each of its citizens. Here the U.N. uses a time series to fill in unreleased GDP Per Capita values with references from OECD Economic Outlook No 102 (Edition November 2017) and then, if missing, forecasts from World Bank’s Global Economic Prospects (Last Updated: 06/04/2017). The GDP growth forecasts are adjusted for population growth with the subtraction of 2015-16 population growth as the projected
2016-17 growth.

3. ***Health Life Expectancy***
Healthy life expectancy (HLE) is a population health measure that combines mortality data with morbidity or health status data to estimate expected years of life in good health for persons at a given age. The time series of healthy life expectancy at birth are calculated by the authors based on data from the World Health Organization (WHO), the World Development Indicators (WDI), and statistics published in journal articles.

4. ***Social support (or having someone to count on in times of trouble)***
Is the national average of the binary responses (either 0 or 1) to the GWP question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”.

5. ***Freedom to make life choices***
Is the national average of responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

6. ***Generosity***
Is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.

7. ***Corruption Perception***
The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?” The overall perception is just the average of the two 0-or-1 responses. In case the perception of government corruption is missing, the U.N. uses the perception of business corruption as the overall perception. The corruption perception at the national level is just the average response of the overall perception at the individual level.

***Data Gathering***
The data for this project was gathering from two primary sources :
1. World Happiness Report : The United Nations releases a report annually. It contains in-depth explanation of the defintion of the factors, the methodlogies used as well as information on the participating countries. It also had CSV that had both granular and aggregate level data.

2. Wikipedia : Wikipedia was scrapped using ***pandas*** to get rankings. It had a neat representation of the top countries per year in an easy to obtain format.

3. World Mental and Substance Disorders dataset : A WHO data set in CSV format. Its rank was used to find if any correlations exist with the world happiness index. 

4. World Suicide Data : A WHO data set in CSV format. Its rank was used to find if any correlations exist with the world happiness index. 

In [None]:
# Importing required libraries for data gathering and storage
import pandas as pd
import re
import io
import requests

In [None]:
# World Happiness Index : Wikipedia
df2018 = pd.read_html("https://en.wikipedia.org/wiki/World_Happiness_Report \
                      #2018_World_Happiness_Report")
df_happiness_2018 = df2018[4].rename(columns=df2018[4].iloc[0]).drop(df2018[4].index[0])
df_happiness_2018['Score'] = df_happiness_2018['Score'].astype('float64')
df_happiness_2018['GDP per capita'] = df_happiness_2018['GDP per capita'] \
.astype('float64') 

#World Happiness Index : CSV files
df_happiness_2015 = pd.read_csv('./Datasets/2015.csv')
df_happiness_2016 = pd.read_csv('./Datasets/2016.csv')
df_happiness_2017 = pd.read_csv('./Datasets/2017.csv')
    
# World Suicide Statistics : Wikipedia
dflist = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_suicide_rate')
df_suicide = dflist[2].rename(columns=dflist[2].iloc[0]).drop(dflist[2].index[0:2])
df_suicide = df_suicide.reset_index().drop(['index'], axis=1)
df_suicide.columns = ['both_sexes_rank','Country','Continent','both_sexes_number', \
                             'males_rank','males_number','females_rank',
                             'females_number','m_f_ratio']
df_suicide['Country'] = df_suicide['Country'].apply(lambda v: \
                                                    re.sub(' [(]more info[)]','', v))
df_suicide['Country'] = df_suicide['Country'].apply(lambda v: re.sub('\[a\]','', v))
df_suicide['Country'] = df_suicide['Country'].apply(lambda v: re.sub('\[\d\]','', v))
df_suicide['both_sexes_number'] = df_suicide['both_sexes_number'].astype('float64')

#World Mental Health Data
df_mental=pd.read_csv("./Datasets/mental-health.csv")

### ***Data Cleaning***

The datasets for published as part of the U.N. report per year had some discrepancies in terms of variable names used. The datasets also did not have the same number of columns. Data munging was thus done using Pandas to get uniformity across the datasets. 

In [None]:
#Obtained data has different formats in terms of column names, number of columns
#Data needs to be standardized
from IPython.display import display, HTML

print ("World Happiness Index 2018 :")
display(df_happiness_2018.head(10))

#Cleaning 2017 data.
df_happiness_2017.drop(['Whisker.high', 'Whisker.low', 'Dystopia.Residual'],axis = 1, inplace = True)
df_happiness_2017.columns = ['Country', 'Overall Rank', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
df_happiness_2017 = df_happiness_2017[df_happiness_2018.columns]
print ("World Happiness Index 2017 :")
display(df_happiness_2017.head(10))

#Cleaning 2016 data.
df_happiness_2016.drop(['Region', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Dystopia Residual'], axis = 1, inplace = True)
df_happiness_2016.columns = ['Country', 'Overall Rank', 'Score','GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Perceptions of corruption', 'Generosity']
df_happiness_2016 = df_happiness_2016[df_happiness_2018.columns]
print ("World Happiness Index 2016 :")
display(df_happiness_2016.head(10))

#Cleaning 2015 data.
df_happiness_2015.drop(['Region', 'Standard Error', 'Dystopia Residual'], axis = 1, inplace = True)
df_happiness_2015.columns = ['Country', 'Overall Rank', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Perceptions of corruption', 'Generosity']
df_happiness_2015 = df_happiness_2015[df_happiness_2018.columns]
print ("World Happiness Index 2015 :")
display(df_happiness_2015.head(10))

#World Suicide Data
print ("World Suicide Data :")
display(df_suicide.head(10))

#Mental Health Data
df_mental = pd.read_csv('./Datasets/mental-health.csv')

### ***Data Visualization***

In order to determine trends we decided to follow a "Granularity Approach". First we focused on getting the macro trends and then drilled down on the correaltions between specific countires and the 6 factors.

All visualizations that are created in Tableau have their screenshots attached ***followed by cell blocks that have Tableau embeded in them*** (login will be required to view interactive Tableau worksheets). If only the visualizations are to be seen then please check Tableau Public for user : aamalik@andrew.cmu.edu

***1. Macro trends : World Map Cluster***
    
In order to come up with a starting point we decided to cluster the countries based on their happiness index.

In [None]:
from IPython.display import Image
Image("Tableau_Visualizations/World_Map.PNG")

In [None]:
%%HTML 
<script type='text/javascript' src='https://us-east-1.online.tableau.com/javascripts/api/viz_v1.js'>
</script><div class='tableauPlaceholder' style='width: 1000px; height: 827px;'>
<object class='tableauViz' width='1000' height='827' style='display:none;'>
<param name='host_url' value='https%3A%2F%2Fus-east-1.online.tableau.com%2F' /> 
<param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;abhimalik' />
<param name='name' value='Dashboard1&#47;Dashboard1' /><param name='tabs' value='no' />
<param name='toolbar' value='yes' /><param name='showAppBanner' value='false' />
<param name='filter' value='iframeSizedToWindow=true' /></object></div>

#### ***Observation***

The results are unsuprising. The countries that form the top cluster are from :
1. North America
2. Europe
3. Scandinavia
4. Australia

The second happiest cluster is formed by :
1. Russia
2. Mexico
3. South American countries like Brazil and Argentina
4. Few countries from Europe
5. Few eastern nations

The third happiest cluster is formed by :
1. African countries and its neighbours

The final cluster consists of :
1. A large number of African countries
2. Venezuela particuarly sticks out in South America
3. India

***One interesting thing that can be observed is that the countries that form these clusters are not always very close to each other, many African nations are in the unhappiest cluster but some of their immediate neighbours are much happier***

***2. Drilling Down : Country Wise Trends***

After observing the clusters at a global scale and following our macro to micro approach we drilled down to find which countires had gained and lost rank over time. We came up with a visual that showed the change in rank from 2015 to 2018. 

For countries that gained rank (Green) : The start of the bar is the rank of the country in 2015 and the end is the rank in 2018.
    
For countries that lost rain (Red) : The start of the bar is the rank of the country in 2018 and the end is the rank in 2015.

In [None]:
Image("Tableau_Visualizations/Country_Trend.PNG")

***Observation***

There are definitely countries that have gone up the list over the years as well as examples of countries that have become sadder or gained ranked over the years.

In [None]:
%%HTML
<script type='text/javascript' src='https://us-east-1.online.tableau.com/javascripts/api/viz_v1.js'>
</script><div class='tableauPlaceholder' style='width: 1279px; height: 536px;'>
<object class='tableauViz' width='1279' height='536' style='display:none;'>
<param name='host_url' value='https%3A%2F%2Fus-east-1.online.tableau.com%2F' /> 
<param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;abhimalik' />
<param name='name' value='Dashboard1&#47;Sheet15' />
<param name='tabs' value='no' /><param name='toolbar' value='yes' />
<param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>

From the above visualization it was clear that some countires are clearly improving much more than others while at the same time some countries were falling greatly in their world ranks. We thus shifted our focus to the top 10 rank falls and top 10 rank gains.

We go on to visualize top rank falls and top rank gains as well as biggest rank gains.

In [None]:
Image("Tableau_Visualizations/Rank_Falls.png")

In [None]:
Image("Tableau_Visualizations/Rank_Gains.png")

***Observation***

Ivory Coast is the biggessr rank gainer over the years while Venezuela has become significantly unhappier.

Investigating if a correaltion exist between rank and factors using scatter plots :

***3. Correlations***

The U.N, over the years, has been able to identify which factors might be influencing the index the most. While the score and these factors are calculated independently, they are listed together to serve as markers for policy makers. They might be the key to identifying why certain trends might exist and why some countries gain or lose rank over time.

To investigate this we plotted scattter plots and try to find, empirically, correlations if any.

In [None]:
Image("Tableau_Visualizations/Correlation_1.png")

In [None]:
%%HTML
<script type='text/javascript' src='https://us-east-1.online.tableau.com/javascripts/api/viz_v1.js'>
</script><div class='tableauPlaceholder' style='width: 1000px; height: 827px;'>
<object class='tableauViz' width='1000' height='827' style='display:none;'>
<param name='host_url' value='https%3A%2F%2Fus-east-1.online.tableau.com%2F' /> 
<param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;abhimalik' />
<param name='name' value='Dashboard1&#47;Dashboard3' /><param name='tabs' value='no' />
<param name='toolbar' value='yes' /><param name='showAppBanner' value='false' />
<param name='filter' value='iframeSizedToWindow=true' /></object></div>

In [None]:
Image("Tableau_Visualizations/Correlation_2.png")

In [None]:
%%HTML
<script type='text/javascript' src='https://us-east-1.online.tableau.com/javascripts/api/viz_v1.js'>
</script><div class='tableauPlaceholder' style='width: 1000px; height: 827px;'>
<object class='tableauViz' width='1000' height='827' style='display:none;'>
<param name='host_url' value='https%3A%2F%2Fus-east-1.online.tableau.com%2F' />
<param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;abhimalik' />
<param name='name' value='Dashboard1&#47;Dashboard4' />
<param name='tabs' value='no' /><param name='toolbar' value='yes' />
<param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /><
/object></div>

***Observations***

From the plots above there are some correlations that might exist between the score and factors. :
1. ***Score vs GDP per capita***
A very strong correlation can be observed here. It is an expected result as higher GDP shows that there is more value addition in the economy or there is a higher income or rise in expenditure which implies that there is a rise in the standard of   living of the citizens and due to rise in income there will be a higher spending on various goods and services like healthcare, education, etc. Better quality of life can thus have a possible correlation with the happiness index.

    Some observations worth noting here are the clusters that form at the very top. The bubbles in that cluster comprise primarily of Scandinavian countries. They seem to have a higher happiness index as compared to countires like the UAE, Luxemberg, Singapore and Qatar even though the latter countires have higher GDPs. At the other end of the specturm there is Somalia, a country who's GDP is reported as 0 [possibly due to the GDP being scaled] and yet score higher on the happiness index rank than many other nations. This result is really suprising as Somalia has been embroiled in civil war since 1999 and yet its citizens have reported a happiness score of 4.9. It leads to further questioning as to the methodologies used by the U.N or if only certain parts of Somalia were considered however keeping in mind the fact that the U.N used survey participants distributed across country geographically to get a wholistic representation it makes for a truly suprising result.


2. ***Score vs Health Life Expectancy***
Another correlation that comes with little suprise is between the happiness index and Health Life Expectancy. Again we see the Scandinavians form the top cluster, while Nigeria here is an outlier with a score of 5.11 even though it is significantly low on the life expectancy score.


3. ***Score vs Social Support***

    There is a strong correlation here. The United Arab Emirates scores poorly and yet is significantly happier on the      index.


4. ***Score vs Corruption***

    Uninteresting


5. ***Score vs Generosity***

    Unintersting


6. ***Score vs Freedom to make life choices***

    Unintersting

### Hypothesis Testing

Based on our exploration of the data set done above, we came up with the following Hypothesis to test :

#### 1. Can money buy happiness ? GDP per capita, the biggest reason for happier countires

#### 2. Why are Scandinavian countires doing so well ? They top the list every year

Besides these hypothesis we wanted to find out :
#### What can policy makers takeaway from this ?

#### Hypothesi 1 : Can money buy happiness?

#### Answer : Not always !

On seeing the correlation plot there seems to be a very strong correlation between happiness and GDP per capita and that would indeed be a good estimate of how well a country is doing however this trend is not always followed.

On drilling down into the data we came up with an interesting observation. We took India and some of its neighboring countries and observed the following:

In [None]:
Image("Tableau_Visualizations/India_Neighbors.png")

***Observation***

Even though India has a higher GDP than countries like Bangladesh, Nepal and Pakistan it scores lower on the happiness index realtive to them. To investigate this further we plotted the following : 

In [None]:
Image("Tableau_Visualizations/India_Asia.png")

***Observation***
The one thing that stood out here is the difference in Social Support. No other factor seemed to show why India would rank lower. This could be an idication that the social support system of the country is facing some issues. The correspond survey question asked is “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?” this might revela some insight. Policy makers can investigate what is going wrong here.

A similar case can be made for the outlierd that exist in the plot of Score vs GDP : Somalia and Botswana

In [None]:
Image("Tableau_Visualizations/Outliers.png")

***Observation***

We comapred them to the world's happiest nation Finland to see why, despite adverse condtions, is Somalia so happy? Its GDP is very low, it has been embroiled in a Civil War for over two decades and yet it manages to be happier than Botswana, a country with a stable government and better GDP.

It would be interesting for policy makers to investigate why this might happens and what lessons can be learned from Somalia's sitaution.

***2. Why are Scandinavian countries doing so well?***

Besides the GDP and other obvious factors, the reason behinf why Scandinavians did well overall year after year remained illusive to us.

One way in which we decided to distinguish these countires was by temperature, and we stumbled across an interesting finding.

In [None]:
Image("Tableau_Visualizations/Temperature.png")

***Observation***

There is a statistically significant correlation between the temperature of a country and the happiness score. Colder countires tend to rank higher as compared to hotter ones. There exists an ineresting theory to back this. 

Countries will colder climated and harsh weather have histroically seen the development / formation of tigher knit social group. It seems to have been a survival instinct and that may have translated to better social support and thus happier individuals in these countries.

#### ***Experimentation***

Besides the World Happiness Index there are a number of different indices that policy makers. One popular one is World Suicide Rank. A study conducted at the Univeristy of Warwick came up with a research paper than countries with the highest happiness index also tend to have the highest suicide rates [https://www.sciencedaily.com/releases/2011/04/110421082641.htm]. We investiage this claim as well. 

As well as the link of happiness index with mental health and drug usage


In [None]:
Image("Tableau_Visualizations/Mental.png")

In [None]:
Image("Tableau_Visualizations/Suicide.png")

***Observation***
We did not find any statistcally significant correlation between the score and mental health issue as well as suicide rates.

#### Conclusion / Lesson

##### 1. Exploratory data analysis is a challenging task, good to come up with strategies before hand before diving in

##### 2. Survey data can be tricky at times

##### 3. Tableau is not magic, it takes time and effort to understand