Hello, this is Ardian (a.k.a Lovelandmark). In this notebook we're going to do an exploratory data analysis and data visualization of Southeast Asia's World Happiness Report of 2021 using Python.

> World Happiness Report is a report specialized on measuring nation and world's happiness. The scores are sourced from Gallup World Poll, and it consists of six components : logged GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity and perceptions of corruption.

Ih this notebook we will discuss:

1. Data preparation and exploratory data analysis,
2. Correlation between six happiness factors,
3. Data visualization - what does make Southeast Asia stand out among other regions, and
4. Remarks for Southeast Asia's happiness development.

# Data Preparation

We will use the World Happiness Report 2021, available on Kaggle : https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021
Then we import all the required libraries, which are numpy, pandas, matplotlib, seaborn, and KMeans from sklearn.cluster

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as mpl
import seaborn as sns
from sklearn.cluster import KMeans


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Steps:
No.1 Import and initialize the dataset used. We're going to use the 2021 World Happiness Report .csv file as datas from other years aren't the subject of our discussion here. Only nine columns are used, which are country name and region it belongs to, alongside their ladder score and six factors determining it.

In [None]:
dataset = pd.read_csv('/kaggle/input/world-happiness-report-2021/world-happiness-report-2021.csv')
dataset = dataset[['Country name', 'Regional indicator', 'Ladder score', 
                       'Logged GDP per capita', 'Social support', 'Healthy life expectancy', 
                       'Freedom to make life choices', 'Generosity',
                       'Perceptions of corruption']]

# replace space to _ in column names
for col in dataset.columns :
    if ' ' in col:
        dataset.rename({col:col.replace(' ', '_')}, inplace=True, axis=1)

No.2 Next we're going to test whether our dataset can be loaded properly.

In [None]:
print(dataset)

No.3 Great! Our dataset can be loaded properly. Now let's make new dataset which contains the values from ONLY the Southeast Asian nations.

In [None]:
ds_sea = dataset[dataset['Regional_indicator'] == 'Southeast Asia']
print(ds_sea)

No.4 Check the contents of datasets with info.

In [None]:
print(dataset.info())

In [None]:
print(ds_sea.info())

No.5 With two datasets ready let's kick off the work by analyzing scatter plot of factors determining the ladder score, in both datasets.

In [None]:
sns.pairplot(dataset,x_vars=[
 'Logged_GDP_per_capita',
 'Social_support',
 'Healthy_life_expectancy',
 'Freedom_to_make_life_choices',
 'Generosity',
 'Perceptions_of_corruption'],
    y_vars=['Ladder_score'],)

In [None]:
sns.pairplot(ds_sea,x_vars=[
 'Logged_GDP_per_capita',
 'Social_support',
 'Healthy_life_expectancy',
 'Freedom_to_make_life_choices',
 'Generosity',
 'Perceptions_of_corruption'],
    y_vars=['Ladder_score'],)

We're now seeing that as the ladder score goes up, four of the components tend to go up. Only generosity and perceptions of corruption seem to have negative correlation versus the ladder score.

5. Confirm the hypothesis with a correlation heatmap.

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.heatmap(dataset.corr(),annot=True,cmap="Blues")
mpl.show()

Now let's move to the SE Asia dataset.

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.heatmap(ds_sea.corr(),annot=True,cmap="Blues")
mpl.show()

At this point we can see similar correlation between factors in Southeast Asia and around the world.
Those with strong correlation with the ladder score are:

> 1. Logged GDP per capita,
> 2. Social support, and
> 3. Healthy life expectancy.

Freedom to make life choices seem to have weak positive correlation with the ladder score, while generosity and perception of corruption both have negative correlation. Although it seems logical on the perception of corruption front (i.e the better their ladder score the less their perception of corruption will be), one alarming note is found on the generosity/ladder score negative correlation. If the preliminary data has to say, **Southeast Asian people tend to be more happy when they're more stingy towards others**, but more on that later.

Among the six factors of Southeast Asia's World Happiness Report 2021, those with strong correlation (both positive or negative) are:
> 1. Generosity/Freedom to Make Life Choices **(-0.83)**
> 2. Healthy Life Expectancy/Logged GDP per Capita **(0.87)**, and
> 3. Healthy Life Expectancy/Social Support **(0.87)**.

Now we also spotted the unhealthy pattern of individualism on this correlation, particularly on the Generosity/Freedom to Make Choices correlation. It seems that **the less they care (and subsequently share what they have) for other people, the more freedom they have to make their own life decisions**. Now we'll see how we stacks up against the world in those metrics.

No. 6a Do the world scatter plot grind - scatter plot of significant correlation we pointed out in Step 5

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Ladder_score',y='Logged_GDP_per_capita',hue='Regional_indicator')
mpl.show()

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Ladder_score',y='Social_support',hue='Regional_indicator')
mpl.show()

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Ladder_score',y='Healthy_life_expectancy',hue='Regional_indicator')
mpl.show()

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Generosity',y='Freedom_to_make_life_choices',hue='Regional_indicator')
mpl.show()

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Healthy_life_expectancy',y='Logged_GDP_per_capita',hue='Regional_indicator')
mpl.show()

In [None]:
figure = mpl.figure(figsize=(16,8))
sns.scatterplot(data=dataset,x='Healthy_life_expectancy',y='Social_support',hue='Regional_indicator')
mpl.show()

Looking at the scatter plot we can see the following
> 1. Southeast Asia is within the middle of Logged GDP per Capita/Ladder Score and Healthy Life Expectanncy/Ladder Score plots. Both of these plots comes with one interesting outlier which is on top right on the plot (in the region where Western Europe and North America and ANZ stands).
> 2. Southeast Asia is one of the greatest Social Support-scoring region with decent Ladder Score. Three nations are lagging behind in Ladder score but maintains decent level of Social Support score.
> 3. Most of the nations are exhibiting low levels of generosity while maintaining high freedom of making life decisions. However there are two distinct, stark outlier nations who could exhibit both high levels of generosity and high freedom of making life choices.
> 4. Southeast Asia isn't that far off from Western Europe for being one of the best regions in terms of Logged GDP per Capita and Healthy Life Expectancy scores. They scored just slightly more than Middle East and Latin American regions. This is also true with Social Support/Healthy Life Expectancy plot.

# Conclusion

Southeast Asia is renowned for its vibrant, cheerful populace. The numbers of WHR2021 rankings might not entirely represent them, but as the data shows they have strength in:

> 1. Social support aspects - one of the world's best regions, if not the best, and
> 2. Balance between decent GDP, decent healthy life expectancy and freedom of making life decisions.

However, the Generosity/Freedom of Making Life Choices correlation saw a significant negative correlation, a profound sign of individualism that creeps in the lives of the Southeast Asian people. To counter negative effect of this trait, Southeast Asian authorities could encourage people to communicate, even if done virtually. Living in such a challenging era (pandemic and change of socio-economic landscape) can be tough for most people and could lead to people compete with each other in a negative term. Therefore, authorities could provide a way to promote healthy interaction between people in the region.
