## World Happiness Report 2021
<p>The World Happiness Report is a landmark survey of the state of global happiness . The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations.
    
<p><img src="https://www.newsfirst.lk/wp-content/uploads/2019/03/World-Happiness-Report.jpg" alt="World Happiness Report Logo">
    
Based on the dataset, there are **six factors** that could estimates the happiness score of each country:

- Logged GDP per capita: The GDP-per-capita time series using countryspecific forecasts of real GDP growth.
- Social support: Social support refers to assistance or support provided by members of social networks to an individual.
- Healthy life expectancy: Healthy life expectancy is the average life in good health - that is to say without irreversible limitation of activity in daily life or incapacities - of a fictitious generation subject to the conditions of mortality and morbidity prevailing that year.
- Freedom to make life choices: Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” ... It is defined as the average of laughter and enjoyment for other waves where the happiness question was not asked
- Generosity: Generosity is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.
- Perceptions of corruption: The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”
    
In this notebook, we are acting as a Policy Think Thank and we're going to find out how some of the factor affects the country's happiness score. After exploring and cleaning our data, there are some question that we would like to answer like:</p>
<ul>
<li>Which are the happiest country in each region?</li>
<li>How are each factors affects country's happiness?</li>
<li>Which region averages the lowest and highest ladder score?</li>
</ul>

<p>The dataset we will use was taken from <a href="https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021/code">Kaggle</a>, which reference it's data from Gallup World Poll. Let's take a look at our data and start analyzing!

# Import libraries

In [None]:
# import library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

sns.set(rc={'figure.figsize':(12,8)})
sns.set_style("whitegrid")
sns.color_palette("dark")
plt.style.use("fivethirtyeight")

# Read and Display Dataset

In [None]:
report_2021 = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report-2021.csv')

print('report_2021 data:')
display(report_2021.head())

In [None]:
report_2021.describe()

In [None]:
report_2021.info()

# World Happiness Report 2021

In [None]:
#countries in the data
report_2021['Country name'].unique()

In [None]:
print('number of countries:')
report_2021['Country name'].nunique()

# Univarite Analysis

## Number of countries

In [None]:
plt.figure(figsize=(10,3),dpi=100)
plt.title('Number of Countries in Each Region', fontweight='bold')
ax = sns.countplot(y=report_2021["Regional indicator"],palette='deep',order = report_2021["Regional indicator"].value_counts().index)
ax.set_xlabel('Number of Countries')
ax.set_ylabel('')

for p in ax.patches:
    ax.annotate("%.f" % p.get_width(), xy=(p.get_width(), p.get_y()+p.get_height()/2),
            xytext=(5, 0), textcoords='offset points', ha="left", va="center",size='small')

plt.tight_layout()

## Happiest and Less Happy Countries

In [None]:
happiest = report_2021.sort_values(by='Ladder score', ascending=False).head()
less_happy = report_2021.sort_values(by='Ladder score', ascending=False).tail()

In [None]:
happy_less_happy = happiest.append(less_happy, sort=False)
happy_less_happy

In [None]:
plt.figure(figsize=(14, 5));
plt.title('The Top and Bottom 5 of Happiest Country in The World',fontweight="bold");
ax = sns.barplot(y='Country name', x='Ladder score', data=happy_less_happy, palette = 'coolwarm');
ax.set_ylabel('')

for p in ax.patches:
    ax.annotate("%.2f" % p.get_width(), xy=(p.get_width(), p.get_y()+p.get_height()/2),
            xytext=(5, 0), textcoords='offset points', ha="left", va="center")
ax.set_xlabel('Happiness Score')
plt.tight_layout()
        

In [None]:
plt.figure(figsize=(14, 7));
plt.title('The Top and Bottom 5 of Happiest Country in The World by Region',fontweight="bold");
ax = sns.barplot(y='Country name', x='Ladder score', data=happy_less_happy, palette = 'muted',hue='Regional indicator');
ax.set_ylabel('')

plt.text(4.1,5.50, 'Top 5 of happiest country is dominated by Western Europe countries', 
        style='italic',
        bbox={'facecolor':'blue','alpha':0.2,'pad':5});
plt.text(4.1,6.34, 'Bottom 5 of happiest country is dominated by Sub-Saharan Africa countries', 
        style='italic',
        bbox={'facecolor':'orange','alpha':0.2,'pad':5});
plt.text(4.1,7.19, 'However, the least happy country is from South Asia (Afghanistan )', 
        style='oblique', bbox={'facecolor':'green','alpha':0.2,'pad':5}
        );
plt.text(4.1,8.7, "There might be factors beyond regionality that affect peoples' happiness", 
        style='oblique', bbox={'facecolor':'red','alpha':0.2,'pad':5}
        );

for p in ax.patches:
    ax.annotate("%.2f" % p.get_width(), xy=(p.get_width(), p.get_y()+p.get_height()/2),
            xytext=(5, 0), textcoords='offset points', ha="left", va="center")

plt.legend([],[], frameon=False)

ax.set_xlabel('Happiness Score')    
plt.tight_layout()
        

### What does the top and bottom 10 list are saying?

In [None]:
happiest_10 = report_2021.sort_values(by='Ladder score', ascending=False).head(10)
less_happy_10 = report_2021.sort_values(by='Ladder score', ascending=False).tail(10)
happy_less_happy_10 = happiest_10.append(less_happy_10, sort=False)

In [None]:
plt.figure(figsize=(14, 7));
plt.title('The Top and Bottom 10 Happiest Country in The World',fontweight="bold");
ax = sns.scatterplot(y='Country name', x='Ladder score', data=happy_less_happy_10, palette = 'muted',hue='Regional indicator',s=250);
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

## Ladder Score Analysis
from https://worldhappiness.report/faq/:
- it asks respondents to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.

In [None]:
plt.figure(figsize=(16, 5));
plt.title('How does the Distribution of Happiness Score in the World looks like?',fontweight="bold");
ax =sns.distplot(report_2021['Ladder score'], hist=True, kde=True, 
             color = 'darkblue', 
             hist_kws={'edgecolor':'black'},
             kde_kws={'linewidth': 4});
plt.axvline(report_2021["Ladder score"].mean(), c = "maroon",ls='--',lw=3);

plt.text(1.025 ,0.33, 'A score of 5 to 6.5 is what most citizen around the world think',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );       
plt.text(1.025 ,0.300, 'of how happy they are in their country',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );     
plt.text(1.025 ,0.230, 'The range of happiness score is wide spread, showing',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );   
plt.text(1.025 ,0.20, 'inequality of happiness of living around the world',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );  

plt.tight_layout();
ax.set_xlabel('Happiness Score') ;       

# But how does the plot would look like if we break it down to per region?`

In [None]:
report_2021.groupby('Regional indicator')['Ladder score'].describe().reset_index().sort_values('mean',ascending=False)

In [None]:
plt.figure(figsize=(15, 8),dpi=100);
plt.title('But How Does the Happiness Score Looks Like per region?', fontweight="bold");
ax = sns.kdeplot(report_2021['Ladder score'], hue=report_2021['Regional indicator'],palette='muted', fill=True, linewidth=2,legend=True)

plt.text(0.160 ,0.122, 'Most people in Western Europe is a happier people',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );       
plt.text(0.160 ,0.116, 'compared to people from other region',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );   
plt.text(0.160 ,0.102, 'Most Sub-Saharan African countries are not as happy as other region',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );     

plt.text(0.160 ,0.090, 'People in "Latin America & Caribean" and "Central & Eastern',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 
plt.text(0.160 ,0.084, 'Europe" shares a similar happiness score',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 

plt.text(0.160 ,0.070, 'North America & ANZ region score is very concentrated',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 
plt.text(0.160 ,0.0644, 'at a higher score (above 6 to below 8)',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 

plt.text(4 ,0.0744, 'Sub-Saharan Africa',
        style='italic',
        color='black',
        ); 
plt.text(6.8 ,0.083, 'Western Europe',
        style='italic',
        color='black',
        ); 
plt.text(1.7 ,0.0045, 'South Asia',
        style='italic',
        color='black',
        ); 

plt.axvline(report_2021["Ladder score"].median(), c = "maroon",ls='--',lw=3);
plt.text(5.42 ,0.096, 'happiness score median',
        style='italic',
        color='black',
         rotation=90
        ); 
ax.set_xlabel('Happiness Score')

plt.tight_layout()       

In [None]:
plt.figure(figsize=(15, 8),dpi=100);
plt.title('Distribution of Ladder Score per Region', fontweight="bold");
ax = sns.kdeplot(report_2021['Ladder score'], hue=report_2021['Regional indicator'],palette='muted', fill=True, linewidth=2,legend=True,cumulative=True)

plt.text(0.160 ,0.210, 'Most people in Western Europe is a happier people',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );       
plt.text(0.160 ,0.20, 'compared to people from other region',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );   
plt.text(0.160 ,0.18, 'Most Sub-Saharan African countries are not as happy as other region',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        );     

plt.text(0.160 ,0.16, 'People in "Latin America & Caribean" and "Central & Eastern',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 
plt.text(0.160 ,0.15, 'Europe" shares a similar happiness score',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 

plt.text(0.160 ,0.13, 'North America & ANZ region score is very concentrated',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 
plt.text(0.160 ,0.12, 'at a higher score (above 6 to below 8)',
        style='italic',
        bbox={'facecolor': 'red', 'alpha': 0.1, 'pad': 3},
        ); 

plt.text(5.2 ,0.15, 'Sub-Saharan Africa',
        style='italic',
        color='black',
        ); 
plt.text(8 ,0.143, 'Western Europe',
        style='italic',
        color='black',
        ); 
plt.text(1.7 ,0.0045, 'South Asia',
        style='italic',
        color='black',
        ); 

ax.set_xlabel('Happiness Score')
plt.tight_layout()       

In [None]:
regional = report_2021.groupby(['Regional indicator']).mean().reset_index().sort_values('Ladder score',ascending=False)

In [None]:
plt.figure(figsize=(14,7))
plt.title("What Is the Average Happiness Score per Region?",color='black',fontsize=14,fontweight='bold')
ax = sns.barplot(data=regional, y='Regional indicator', x='Ladder score', palette='deep')
ax.set_ylabel('')
ax.set_xlabel('Happiness Score')

for p in ax.patches:
    ax.annotate("%.2f" % p.get_width(), xy=(p.get_width(), p.get_y()+p.get_height()/2),
            xytext=(5, 0), textcoords='offset points', ha="left", va="center")
ax.set_xlabel('Happiness Score')

In [None]:
regional_median = report_2021.groupby(['Regional indicator']).median().reset_index().sort_values('Ladder score',ascending=False)

In [None]:
plt.figure(figsize=(14,7))
plt.title("What Is the Median Happiness Score per Region?",color='black',fontsize=14,fontweight='bold')
ax = sns.barplot(data=regional_median, y='Regional indicator', x='Ladder score', palette='deep')
ax.set_ylabel('')


for p in ax.patches:
    ax.annotate("%.2f" % p.get_width(), xy=(p.get_width(), p.get_y()+p.get_height()/2),
            xytext=(5, 0), textcoords='offset points', ha="left", va="center")
ax.set_xlabel('Happiness Score')
plt.tight_layout()       


# What features determine happiness score the most?

In [None]:
#trim the dataset for easier corr plot
corr_plot = report_2021.copy()
corr_plot.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual'], inplace=True, axis=1)
corr_plot

In [None]:
corr_plot = corr_plot.groupby(['Regional indicator']).mean().reset_index()

In [None]:
corr_plot = corr_plot.sort_values(['Ladder score'], ascending=False)

In [None]:
corr= corr_plot.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness (Ladder) Score?', fontweight='bold')
sns.heatmap(corr, annot=True, mask=matrix);

plt.text(3.50 ,0.70, 'Most Correlated:',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );  

plt.text(3.50 ,1.0, '1. Social Support',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );  

plt.text(3.50 ,1.30, '2. Logged GDP per Capita',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );
plt.text(3.50 ,1.60, '3. Healthy Life Expectancy',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );

plt.text(3.50 ,2.20, 'Further Analysis:',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );
plt.text(3.50 ,2.50, '- Between Healthy Life Expectancy and Logged GDP',
        style='italic',
        size='large',
        weight='bold',
        bbox={'facecolor': 'red', 'alpha': 0.2, 'pad': 5},
        );


plt.text(1.70 ,3.35, '<------------',
        style='italic',
        size='large',
        weight='bold',
        color='red',
        rotation=45);

plt.text(2.15 ,2.90, 'highest correlated score in between features',
        size='small',
        color='red',
        weight='bold'       
        );

plt.tight_layout()       

# GDP X Healthy Life Expectancy

In [None]:
plt.figure(figsize=(17,7))
plt.title('GDP x Healthy Life Expectancy Score Around the World by Region', fontweight='bold')
sns.scatterplot(data=report_2021, x="Logged GDP per capita", y="Healthy life expectancy", hue="Regional indicator",palette='muted',s=200);
plt.legend(loc='best',
          ncol=2, fontsize =12);
ax.set_xlabel('Logged GDP per Capita')
ax.set_ylabel('Healthy Life Expectancy')

plt.tight_layout()        
plt.text(10.1, 53, "Western Europe countries tend to score higher", size='medium',color='maroon', weight='semibold',)
plt.text(10.1, 51, "Sub-Saharan Africa countries tend to score lower", size='medium',color='maroon', weight='semibold',)

# GDP-based Annotation
plt.text(11.35, 75.5, "Singapore", size='small',color='maroon', weight='semibold',)
plt.text(11.48, 71.3, "Luxembourg", size='small',color='maroon', weight='semibold',)
plt.text(11.25, 71, "Ireland", size='small',color='maroon', weight='semibold',)
plt.text(6.55, 52, "Burundi", size='small',color='maroon', weight='semibold',)

# Healthy life expectancy-based Annotation
plt.text(10.87, 75.5, "Hong Kong", size='small',color='maroon', weight='semibold',)
plt.text(7.18, 48.50, "Chad", size='small',color='maroon', weight='semibold',)
plt.text(7.66, 48.50, "Lesotho", size='small',color='maroon', weight='semibold',)

In [None]:
report_2021.columns

In [None]:
report_2021[['Country name','Logged GDP per capita','Regional indicator']].sort_values('Logged GDP per capita',ascending=False)

In [None]:
report_2021[['Country name','Healthy life expectancy','Regional indicator']].sort_values('Healthy life expectancy',ascending=False)

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10, 5), dpi=100)
plt.title('Breakdown of Logged GDP per Capita Score in Each Region', fontweight='bold')
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
ax = sns.swarmplot(y="Regional indicator", x="Logged GDP per capita", data=report_2021,palette='muted');
ax.set_ylabel('')

plt.text(8.8 ,0.1, "concentrated at higher score->", size='x-small',color='maroon', weight='semibold')
plt.text(10.5 ,6.3, "much higher score in it's region", size='x-small',color='maroon', weight='semibold')

plt.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10, 5), dpi=100)
plt.title('Breakdown of Healthy life expectancy Score in Each Region', fontweight='bold')
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
ax = sns.swarmplot(y="Regional indicator", x="Healthy life expectancy", data=report_2021,palette='muted');
ax.set_ylabel('')
plt.text(+62.7 ,0.1, "concentrated at higher score->", size='small',color='maroon', weight='semibold');
plt.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10, 5), dpi=75)
plt.title('Healthy Life Expectancy Score in Each Region', fontweight='bold')
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
plt.text(61.5 ,0.1, "concentrated at higher score->", size='small',color='maroon', weight='semibold')
ax = sns.boxplot(y="Regional indicator", x="Healthy life expectancy", data=report_2021, palette='muted')
ax.set_ylabel('');

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10, 5), dpi=75)
plt.title('Logged GDP per Capita Score in Each Region', fontweight='bold')
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
ax = sns.boxplot(y="Regional indicator", x="Logged GDP per capita", data=report_2021,palette='muted')
plt.text(8.5 ,0.1, "concentrated at higher score->", size='small',color='maroon', weight='semibold')
ax.set_ylabel('');

# Analysis by Continent

In [None]:
report_2021['Regional indicator'].unique()

## Western Europe

In [None]:
western_europe = report_2021[report_2021['Regional indicator']=='Western Europe']
western_europe.head(2)

In [None]:
#trim the dataset for easier corr plot
westeu_clean = western_europe.copy()
westeu_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
westeu_clean.head(2)

In [None]:
corr_westeu= westeu_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_westeu)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Western Europe?', fontweight='bold')
sns.heatmap(corr_westeu, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_westeu = corr_westeu[['Ladder score']].reset_index()
line_westeu

In [None]:
pivot_westeu = pd.pivot_table(line_westeu, values='Ladder score', columns='index').reset_index()
pivot_westeu = pivot_westeu.rename_axis(None, axis=1)  
pivot_westeu['index'] = 'Western Europe' 
pivot_westeu = pivot_westeu.rename(columns={'index':'Region'})
pivot_westeu

## North America and ANZ

In [None]:
north_america_anz = report_2021[report_2021['Regional indicator']=='North America and ANZ']
north_america_anz.head(2)

In [None]:
#trim the dataset for easier corr plot
north_america_anz_clean = north_america_anz.copy()
north_america_anz_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
north_america_anz_clean.head(10)

In [None]:
corr_north_america_anz= north_america_anz_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_north_america_anz)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in North America and ANZ?', fontweight='bold')
sns.heatmap(corr_north_america_anz, annot=True, mask=matrix);
plt.tight_layout()     
plt.show()

In [None]:
line_north_america_anz = corr_north_america_anz[['Ladder score']].reset_index()
line_north_america_anz

In [None]:
pivot_north_america_anz = pd.pivot_table(line_north_america_anz, values='Ladder score', columns='index').reset_index()
pivot_north_america_anz = pivot_north_america_anz.rename_axis(None, axis=1)  
pivot_north_america_anz['index'] = 'North America and ANZ' 
pivot_north_america_anz = pivot_north_america_anz.rename(columns={'index':'Region'})
pivot_north_america_anz

## Middle East and North Africa

In [None]:
MENA = report_2021[report_2021['Regional indicator']=='Middle East and North Africa']
MENA.head(2)

In [None]:
#trim the dataset for easier corr plot
MENA_clean = MENA.copy()
MENA_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
MENA_clean.head(2)

In [None]:
corr_MENA= MENA_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_MENA)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Middle East and North Africa?', fontweight='bold')
sns.heatmap(corr_MENA, annot=True, mask=matrix);

plt.tight_layout()       
plt.show()

In [None]:
line_MENA = corr_MENA[['Ladder score']].reset_index()
line_MENA

In [None]:
pivot_MENA = pd.pivot_table(line_MENA, values='Ladder score', columns='index').reset_index()
pivot_MENA = pivot_MENA.rename_axis(None, axis=1)  
pivot_MENA['index'] = 'Middle east and North Africa' 
pivot_MENA = pivot_MENA.rename(columns={'index':'Region'})
pivot_MENA

In [None]:
pivot_MENA.columns

## Latin America and Caribbean

In [None]:
latin_america_caribbean = report_2021[report_2021['Regional indicator']=='Latin America and Caribbean']
latin_america_caribbean.head(2)

In [None]:
#trim the dataset for easier corr plot
latin_america_caribbean_clean = latin_america_caribbean.copy()
latin_america_caribbean_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
latin_america_caribbean_clean.head(2)

In [None]:
corr_latin_america_caribbean= latin_america_caribbean_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_latin_america_caribbean)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Latin America and Caribbean?', fontweight='bold')
sns.heatmap(corr_latin_america_caribbean, annot=True, mask=matrix);

plt.tight_layout()      
plt.show()

In [None]:
line_latin_america_caribbean = corr_latin_america_caribbean[['Ladder score']].reset_index()
line_latin_america_caribbean

In [None]:
pivot_LAC = pd.pivot_table(line_latin_america_caribbean, values='Ladder score', columns='index').reset_index()
pivot_LAC = pivot_LAC.rename_axis(None, axis=1)  
pivot_LAC['index'] = 'Latin America and Caribbean' 
pivot_LAC = pivot_LAC.rename(columns={'index':'Region'})
pivot_LAC

## Central and Eastern Europe

In [None]:
central_and_eastern_europe = report_2021[report_2021['Regional indicator']=='Central and Eastern Europe']
central_and_eastern_europe.head(2)

In [None]:
#trim the dataset for easier corr plot
CEE_clean = central_and_eastern_europe.copy()
CEE_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
CEE_clean.head(2)

In [None]:
corr_CEE= CEE_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_CEE)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Central and Eastern Europe?', fontweight='bold')
sns.heatmap(corr_CEE, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_CEE = corr_CEE[['Ladder score']].reset_index()
line_CEE

In [None]:
pivot_CEE = pd.pivot_table(line_CEE, values='Ladder score', columns='index').reset_index()
pivot_CEE = pivot_CEE.rename_axis(None, axis=1)  
pivot_CEE['index'] = 'Central and Eastern Europe' 
pivot_CEE = pivot_CEE.rename(columns={'index':'Region'})
pivot_CEE

## East Asia

In [None]:
east_asia = report_2021[report_2021['Regional indicator']=='East Asia']
east_asia.head(2)

In [None]:
#trim the dataset for easier corr plot
east_asia_clean = east_asia.copy()
east_asia_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
east_asia_clean.head(2)

In [None]:
corr_east_asia= east_asia_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_east_asia)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in East Asia?', fontweight='bold')
sns.heatmap(corr_east_asia, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_east_asia = corr_east_asia[['Ladder score']].reset_index()
line_east_asia

In [None]:
pivot_east_asia = pd.pivot_table(line_east_asia, values='Ladder score', columns='index').reset_index()
pivot_east_asia = pivot_east_asia.rename_axis(None, axis=1)  
pivot_east_asia['index'] = 'East Asia' 
pivot_east_asia = pivot_east_asia.rename(columns={'index':'Region'})
pivot_east_asia

## ASEAN

In [None]:
asean = report_2021[report_2021['Regional indicator']=='Southeast Asia']
asean.head(2)

In [None]:
#trim the dataset for easier corr plot
asean_clean = asean.copy()
asean_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
asean_clean.head(2)

In [None]:
corr_asean= asean_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_asean)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in ASEAN?', fontweight='bold')
sns.heatmap(corr_asean, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_asean = corr_asean[['Ladder score']].reset_index()
line_asean

In [None]:
pivot_ASEAN = pd.pivot_table(line_asean, values='Ladder score', columns='index').reset_index()
pivot_ASEAN = pivot_ASEAN.rename_axis(None, axis=1)  
pivot_ASEAN['index'] = 'South East Asia' 
pivot_ASEAN = pivot_ASEAN.rename(columns={'index':'Region'})
pivot_ASEAN

In [None]:
asean

In [None]:
plt.figure(figsize=(14,6))
plt.title('Happiness Score in ASEAN Countries',weight='bold')
ax = sns.barplot(x="Country name", y="Ladder score", data=asean, palette='muted')
for p in ax.patches:
        ax.annotate('{:.2f}'.format(p.get_height()), (p.get_x()+0.3, p.get_height()+0.05))
ax.set_ylabel('Happiness Score');
ax.set_xlabel('');


plt.tight_layout()       
plt.show()

In [None]:
plt.figure(figsize=(15,8))
plt.title('Which  ASEAN Countries Has The Best GDP and Healthy Life Expectancy Score?', fontweight='bold')
sns.scatterplot(data=asean, x="Healthy life expectancy", y="Logged GDP per capita", hue="Country name",
               palette="deep", sizes=(20, 200), legend="full",s=100);

# add text annotation
plt.text(77 ,11.3, "Singapore", size='medium', color='black', weight='semibold')
plt.text(68.3 ,8.85, "Vietnam", size='medium', color='black', weight='semibold')
plt.text(67.7 ,9.68, "Thailand", size='medium', color='black', weight='semibold')
plt.text(67.31 ,10.15, "Malaysia", size='medium', color='black', weight='semibold')
plt.text(62.45 ,9.40, "Indonesia", size='medium', color='black', weight='semibold')
plt.text(62.26 ,9, "Philippines", size='medium', color='black', weight='semibold')
plt.text(62.16 ,8.41, "Cambodia", size='medium', color='black', weight='semibold')
plt.text(59.60 ,8.42, "Myanmar", size='medium', color='black', weight='semibold')
plt.text(59.2 ,8.93, "Laos", size='medium', color='black', weight='semibold')

plt.text(72.9 ,11.1, "Singapore has the highest score", color='maroon', weight='semibold')
plt.text(72.6 ,11, "of these two most correlated features", color='maroon', weight='semibold')

plt.text(70 ,10.2, "Malaysia, Thailand, and Vietnam has a", color='maroon', weight='semibold')
plt.text(70 ,10.1, 'medium score in these criterias', color='maroon', weight='semibold')

plt.text(60 ,9.7, "Indonesia, Philippines, Cambodia, Laos,", color='maroon', weight='semibold')
plt.text(60 ,9.6, 'and Myanmar are among the lowest', color='maroon', weight='semibold')

plt.tight_layout()       
plt.show()

In [None]:
plt.figure(figsize=(15,8))
plt.title('Which  ASEAN Countries Has The Best GDP and Social Support Score?', fontweight='bold')
sns.scatterplot(data=asean, x="Healthy life expectancy", y="Social support", hue="Country name",
               palette="deep", sizes=(20, 200), legend="full",s=100);

# add text annotation
plt.text(76 ,0.905, "Singapore", size='medium', color='black', weight='semibold');

plt.text(67.7 ,0.885, "Thailand", size='medium', color='black', weight='semibold');
plt.text(68.3 ,0.845, "Vietnam", size='medium', color='black', weight='semibold');
plt.text(67.4 ,0.810, "Malaysia", size='medium', color='black', weight='semibold');

plt.text(61.6 ,0.835, "Philippines", size='medium', color='black', weight='semibold');
plt.text(61.8 ,0.800, "Indonesia", size='medium', color='black', weight='semibold');
plt.text(62.2 ,0.760, "Cambodia", size='medium', color='black', weight='semibold');
plt.text(59.5 ,0.780, "Myanmar", size='medium', color='black', weight='semibold');
plt.text(59.2 ,0.725, "Laos", size='medium', color='black', weight='semibold');

plt.text(70.1 ,0.750, "In ASEAN, there seems to be 3 cluster of countries,", size='medium', color='maroon', weight='semibold')
plt.text(70.1 ,0.744, "well-off, standard, and lower", size='medium', color='maroon', weight='semibold')

plt.tight_layout()       
plt.show()

## Commonwealth of Independent States

In [None]:
CIS = report_2021[report_2021['Regional indicator']=='Commonwealth of Independent States']
CIS.head(2)

In [None]:
#trim the dataset for easier corr plot
CIS_clean = CIS.copy()
CIS_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
CIS_clean.head(10)

In [None]:
corr_CIS= CIS_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_CIS)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Commonwealth of Independent States?', fontweight='bold')
sns.heatmap(corr_CIS, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_CIS = corr_CIS[['Ladder score']].reset_index()
line_CIS

In [None]:
pivot_CIS = pd.pivot_table(line_CIS, values='Ladder score', columns='index').reset_index()
pivot_CIS = pivot_CIS.rename_axis(None, axis=1)  
pivot_CIS['index'] = 'Commonwealth of Independent States' 
pivot_CIS = pivot_CIS.rename(columns={'index':'Region'})
pivot_CIS

## Sub Saharan Africa

In [None]:
sub_saharan_africa = report_2021[report_2021['Regional indicator']=='Sub-Saharan Africa']
sub_saharan_africa.head(2)

In [None]:
#trim the dataset for easier corr plot
ssa_clean = sub_saharan_africa.copy()
ssa_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
ssa_clean.head(2)

In [None]:
corr_ssa= ssa_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_ssa)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in Sub-Saharan Africa?', fontweight='bold')
sns.heatmap(corr_ssa, annot=True, mask=matrix);
plt.tight_layout()       
plt.show()

In [None]:
line_SSA = corr_ssa[['Ladder score']].reset_index()
line_SSA

In [None]:
pivot_SSA = pd.pivot_table(line_SSA, values='Ladder score', columns='index').reset_index()
pivot_SSA = pivot_SSA.rename_axis(None, axis=1)  
pivot_SSA['index'] = 'Sub-Saharan Africa' 
pivot_SSA = pivot_SSA.rename(columns={'index':'Region'})
pivot_SSA

## South Asia

In [None]:
south_asia = report_2021[report_2021['Regional indicator']=='South Asia']
south_asia.head(2)

In [None]:
#trim the dataset for easier corr plot
southasia_clean = south_asia.copy()
southasia_clean.drop(['Explained by: Generosity','Explained by: Social support','Explained by: Log GDP per capita','Explained by: Perceptions of corruption', 
                 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices','upperwhisker','lowerwhisker','Standard error of ladder score',
               'Ladder score in Dystopia','Dystopia + residual','Regional indicator'], inplace=True, axis=1)
southasia_clean.head(2)

In [None]:
corr_southasia= southasia_clean.corr()

# Getting the Upper Triangle of the co-relation matrix
matrix = np.triu(corr_southasia)

# using the upper triangle matrix as mask 
plt.figure(figsize=(20,12));
plt.title('What Feature is Correlated the Most to Happiness Score in South Asia?', fontweight='bold')
sns.heatmap(corr_southasia, annot=True, mask=matrix);

plt.tight_layout()       
plt.show()

In [None]:
line_southasia = corr_southasia[['Ladder score']]
line_southasia = line_southasia.reset_index()
line_southasia

In [None]:
pivot_southasia = pd.pivot_table(line_southasia, values='Ladder score', columns='index').reset_index()
pivot_southasia = pivot_southasia.rename_axis(None, axis=1)  
pivot_southasia['index'] = 'South Asia' 
pivot_southasia = pivot_southasia.rename(columns={'index':'Region'})
pivot_southasia

# Region Correlation line-plot

In [None]:
report_2021['Regional indicator'].unique()

In [None]:
region_correlation = pd.concat([pivot_southasia, pivot_SSA, pivot_CIS, pivot_ASEAN, pivot_east_asia, pivot_CEE, pivot_LAC, 
                                pivot_MENA, pivot_north_america_anz, pivot_westeu], ignore_index=True)
region_correlation.drop(['Ladder score'],inplace=True, axis=1)
region_correlation

In [None]:
region_correlation.columns

In [None]:
region_cor_plot = region_correlation[['Region', 'Freedom to make life choices', 'Generosity',
       'Healthy life expectancy', 'Logged GDP per capita',
       'Perceptions of corruption', 'Social support']].set_index('Region')
region_cor_plot

In [None]:
# use absolute value of the correlation score
region_cor_plot= region_cor_plot[['Freedom to make life choices', 'Generosity',
       'Healthy life expectancy', 'Logged GDP per capita',
       'Perceptions of corruption', 'Social support']].apply(abs)

In [None]:
region_cor_plot

In [None]:
south_asia = region_cor_plot.loc['South Asia']
sub_sahara = region_cor_plot.loc['Sub-Saharan Africa']
commonwealth = region_cor_plot.loc['Commonwealth of Independent States']

asean = region_cor_plot.loc['South East Asia']
east_asia = region_cor_plot.loc['East Asia']
central_europe = region_cor_plot.loc['Central and Eastern Europe']
latin = region_cor_plot.loc['Latin America and Caribbean']

middle_east = region_cor_plot.loc['Middle east and North Africa']
north_america = region_cor_plot.loc['North America and ANZ']
west_europe = region_cor_plot.loc['Western Europe']

In [None]:
fig, axes = plt.subplots(5, 2, figsize=(25, 25), sharey=False)
fig.suptitle('Indicators Correlaton to Happiness Score per Region',fontweight='bold',fontsize='xx-large')

# South Asia
sns.barplot(ax=axes[0,0], x=south_asia.index, y=south_asia.values)
axes[0,0].set_title(south_asia.name)

# Sub Sahara Africa
sns.barplot(ax=axes[0,1], x=sub_sahara.index, y=sub_sahara.values)
axes[0,1].set_title(sub_sahara.name)

# Commonwealth Of Independent States
sns.barplot(ax=axes[1,0], x=commonwealth.index, y=commonwealth.values)
axes[1,0].set_title(commonwealth.name)


# East Asia
sns.barplot(ax=axes[1,1], x=east_asia.index, y=east_asia.values)
axes[1,1].set_title(east_asia.name)

# Central and Eastern Europe
sns.barplot(ax=axes[2,0], x=central_europe.index, y=central_europe.values)
axes[2,0].set_title(central_europe.name)

# Latin America and Caribbean
sns.barplot(ax=axes[2,1], x=latin.index, y=latin.values)
axes[2,1].set_title(latin.name)



# Middle East and North Africa
sns.barplot(ax=axes[3,0], x=middle_east.index, y=middle_east.values)
axes[3,0].set_title(middle_east.name)

# North America and ANZ
sns.barplot(ax=axes[3,1], x=north_america.index, y=north_america.values)
axes[3,1].set_title(north_america.name)

# Western Europe
sns.barplot(ax=axes[4,0], x=west_europe.index, y=west_europe.values)
axes[4,0].set_title(west_europe.name)

# ASEAN
sns.barplot(ax=axes[4,1], x=asean.index, y=asean.values)
axes[4,1].set_title(asean.name)



plt.tight_layout()       
plt.show()

In [None]:
region_cor_plot

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(25, 7), sharey=False)
fig.suptitle('Asian and Commonwealth of Independent States Countries',fontweight='bold',fontsize='xx-large')
sns.barplot(ax=axes[0,0], x=east_asia.index, y=east_asia.values)
axes[0,0].set_title(east_asia.name);
sns.barplot(ax=axes[0,1], x=south_asia.index, y=south_asia.values)
axes[0,1].set_title(south_asia.name);
sns.barplot(ax=axes[1,0], x=asean.index, y=asean.values)
axes[1,0].set_title(asean.name);
sns.barplot(ax=axes[1,1], x=commonwealth.index, y=commonwealth.values)
axes[1,1].set_title(commonwealth.name);

plt.tight_layout()       
plt.show()

In [None]:
fig, axes = plt.subplots(4, 1, figsize=(25, 15), sharey=False)
fig.suptitle('Asian and Commonwealth of Independent States Countries',fontweight='bold',fontsize='xx-large')
sns.barplot(ax=axes[0], x=east_asia.index, y=east_asia.values)
axes[0].set_title(east_asia.name, fontsize=30, weight='bold');
sns.barplot(ax=axes[1], x=south_asia.index, y=south_asia.values)
axes[1].set_title(south_asia.name, fontsize=30, weight='bold');
sns.barplot(ax=axes[2], x=asean.index, y=asean.values)
axes[2].set_title(asean.name,fontsize=30, weight='bold');
sns.barplot(ax=axes[3], x=commonwealth.index, y=commonwealth.values)
axes[3].set_title(commonwealth.name,fontsize=30, weight='bold');

plt.tight_layout()       
plt.show()

In [None]:
fig, axes = plt.subplots(4, 1, figsize=(25, 15), sharey=False)
fig.suptitle('European and American Countries',fontweight='bold',fontsize='xx-large')
sns.barplot(ax=axes[0], x=west_europe.index, y=west_europe.values)
axes[0].set_title(west_europe.name,fontsize=30, weight='bold');
sns.barplot(ax=axes[1], x=central_europe.index, y=central_europe.values)
axes[1].set_title(central_europe.name,fontsize=30, weight='bold');
sns.barplot(ax=axes[2], x=north_america.index, y=north_america.values)
axes[2].set_title(north_america.name,fontsize=30, weight='bold');
sns.barplot(ax=axes[3], x=latin.index, y=latin.values)
axes[3].set_title(latin.name,fontsize=30, weight='bold');

plt.tight_layout()       
plt.show()

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(25, 7), sharey=False)
fig.suptitle('African and Middle East Countries',fontweight='bold',fontsize='xx-large')
sns.barplot(ax=axes[0], x=middle_east.index, y=middle_east.values)
axes[0].set_title(middle_east.name,fontsize=30, weight='bold');
sns.barplot(ax=axes[1], x=sub_sahara.index, y=sub_sahara.values)
axes[1].set_title(sub_sahara.name,fontsize=30, weight='bold');

plt.tight_layout()       
plt.show()