In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
%pylab inline
%config InlineBackend.figure_formats = ['retina']
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import scipy.stats as stats
import seaborn as sns
sns.set()
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Where are people the happiest and why? In this report I will attempt to explore the dataset. Based on 6 different factors that could explain the happiness level of a country. The World Happiness Report 2019 is available here: https://worldhappiness.report/ed/2019/ ; the dataset is available from here: https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019.

The 6 happiness-related components are:

1. Healthy life expectancy
2. Log of GDP per capita
3. Generosity
4. Corruption
5. Freedom
6. Social support

I am seeking to understand what makes people happy in stable and predictable environment. Therefore, I will explore the dataset from 2019 excluding the current COVID-19 pandemic. Understandably, due to uncertainty, instability and disappointment happiness has declined in 2020. Hopefully once a recovery is underway it will allow economic activities to resume and resolve the damage done by the COVID-19 outbreak. Once recovery from COVID-19 is completed the key findings and insight become relevant again.

The plan is to explore and visualize the data to understand and get intuition to discover new findings that will be useful later.

In [None]:
data = pd.read_csv('../input/world-happiness-report-2019/world-happiness-report-2019.csv')

In [None]:
#Top 10 happiest countries:
data.head(10)

In [None]:
#top un-happiest countries: 
data.tail()

In [None]:
#any missing data?
data.info()
data.isnull().sum()

Lets check for any significant outliers:

In [None]:
data.describe()

There are some null values becuase the missing data is small we can drop the null values all together without worrying in skewing our findings.

In [None]:
data.shape

In [None]:
data = data[~data.isnull().any(axis=1)]
data.shape

Let's focus our data on the 6 happiness-related factors and the countries score from the ladder. I will remove Positive / Negative affect and the standard deviation of ladder. We are doing this becuase this data doesn't add value for the purposes of the current analysis.

In [None]:
data.drop('SD of Ladder', inplace=True, axis=1)
data.drop('Positive affect', inplace=True, axis=1)
data.drop('Negative affect', inplace=True, axis=1)

**Fig. 1 Heatmap**

In [None]:
plt.figure(figsize=(8,8))
sns.heatmap(data.corr(), annot=True);

**Fig. 2 Healthy life expectancy**

In [None]:
sns.barplot(x="Healthy life\nexpectancy", y="Country (region)", data=data, palette='Accent',  order=data.iloc[:10, 0].value_counts().iloc[:].index)

**Fig. 3 Social support**

In [None]:
sns.barplot(x="Social support", y="Country (region)", data=data, palette='Accent',  order=data.iloc[:10, 0].value_counts().iloc[:].index)


**Fig. 4 Log of GDP per capita**

In [None]:
sns.barplot(x="Log of GDP\nper capita", y="Country (region)", data=data, palette='Accent',  order=data.iloc[:10, 0].value_counts().iloc[:].index)

**Fig. 5 Scatter plots**

In [None]:
plt.figure(1, figsize = (8, 16))
n = 0
for x in ['Social support', 'Freedom', 'Corruption', 'Generosity', 
          'Log of GDP\nper capita', 'Healthy life\nexpectancy']:
    n += 1
    plt.subplot(3, 3, n)
    plt.subplots_adjust(hspace = 0.2, wspace = 0.4)
    sns.regplot(data['Ladder'], data[x])
    plt.title('{} plot'.format(x))

plt.show()

The corruption and freedom data (the dots) are scattered all over with no trends being apparent (weak relationship). Whilst the Log of GDP per capita', 'Healthy life expectancy and 'Social support' have a clear(-er) pattern visibly.

There are three hypothesis we can draw from the data above. We see a strong correlation between the happiness of a country (Ladder score) to three components as follows:

* ‘Health and life expectancy’ is strongly correlated 83% to a country happiness score. That is to say, the healthier the citizens are and the longer they live the happier the people of that country are.
* Equally import seems that Social support correlates 83% and contributes to the calculation of the happiness score.
* Finally, Log of GDP per capita that is the overall purchasing power and wealth of the citizens of a specific country contribute 82% to the happiness score.

I am going to choose an arbitrary threshold of 25. The threshold seem most appropriate due to the fact that the top happiest countries have an average of 20 with each of the above indicators. I will adjust the threshold if the value doesn't provide utility.

**Fig. 6a Line Social Support chart of all countries**

In [None]:
ax = data.iloc[:, :].plot.line(x='Ladder', y='Social support', rot=0,figsize=(20,8))
ax.hlines(y=25, xmin=0, xmax=156, color='red', label='test')

**Fig. 6b Line Healthy life expectancy chart of all countries**

In [None]:
ax = data.iloc[:, :].plot.line(x='Ladder', y='Healthy life\nexpectancy', rot=0,figsize=(20,8))
ax.hlines(y=25, xmin=0, xmax=156, color='red', label='test')

**Fig. 6c Line Log of GDP per capita chart of all countries**

In [None]:
ax = data.iloc[:, :].plot.line(x='Ladder', y='Log of GDP\nper capita', rot=0,figsize=(20,8))
ax.hlines(y=25, xmin=0, xmax=156, color='red', label='test')

Hypotheses testing:

Does a higher social support make a country happier?

H0 (The null hypothesis) rejects this claim. Social support has no effect on the overall country happiness.

H1 accept the hypothesis social support contributes to a country overall happiness.

Fig. 7 Barchart of top 20 countries

In [None]:
ax = data.iloc[:20, :].plot.bar(x='Country (region)', y='Social support', rot=0,figsize=(30,8))

In [None]:
#statistical data from the Social Support column
data.iloc[:, 2].describe()

In [None]:
#statistical data of the top 20 happiest countries from Social Support column
data.iloc[:20, 2].describe()

In [None]:
#statistical data of the top 10 happiest countries from Social Support column
data.iloc[:10, 2].describe()

In [None]:
#social support effect true or false
data['hypothesis'] = 25 - data.iloc[:, 2]

In [None]:
data['hypobool'] = data['hypothesis'] >= 0
data.head(20)

**Some calculations:**

In [None]:
correlation, pvalue = stats.pearsonr(data['Ladder'], data['hypothesis'])
print("The P-value is: ", pvalue)
print("Total True countries with Social support below 25: ", data.iloc[:, -1].sum())
print("True in the 20 top happiest countries: ", data.iloc[:20, -1].sum())
print("True in the 10 top happiest countries: ", data.iloc[:10, -1].sum())

The charts and data seem to favor the null hypothisis (H0): Social support seems not to contributes to a country overall happiness. Because, according to our benchmark we see a clear nagative correlation between a country ladder score and the level of social support it provides. For example in the top 10 happiest countries only 1 country (Austria) provide an above average social support. This can be explained by different scenarios. For example, people are happier in countries where the majority have a better opportunity to earn a *good income. Good in a sense that the income covers the necessities and more.

**Request for additional data**

I believe that a good indicator of a countries well-being can be drawn from its level of infrastructure e.g. how good roads and streets are built. Also, the level of public transport present in a country is, in my opinion, important. Having Traveled Europe, good and well-maintained infrastructure usually meant a people were happier (as an observational paradigm). Thus, this data could improve the happiness indicator and prediction model and perhaps help explain and proof different correlation and causations.

Reference:
https://www.who.int/publications/data/gho/indicator-metadata-registry/imr-details/66

https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1145

https://ec.europa.eu/eurostat/web/products-eurostat-news/-/EDN-20200930-1#:~:text=Your%20key%20to%20European%20statistics&text=In%202018%2C%20the%20life%20expectancy,2%20region%20with%20available%20data.

https://worldhappiness.report/ed/2019/

https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019.