# Introduction 
Hello! This is my analysis on the World Happiness Report. I will be using a top-down analysis to analyse the report which will be seen from a macro perspective to a more micro perspective. I have hidden some of the components as it is mainly code, so if you want to see how I do my visualisations you can expand on these sections. I mainly use plotly for the visualisations, with the exception of seaborn for heatmap. 

# Top-down approach
We will be taking a top-down approach to this - going from macro to micro. 
- We will be looking at the correlations between the ladder score and the different factors, following with the correlations between the factors themselves
- After that, we will be looking at the regions themselves and comparing the regions to see the best regions in each factor.
- We will then be looking at the top 5 and the bottom 5 of each factor. 
- Finally, we will be looking at Singapore (my home country) and how it has progressed over the years. 

In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
from plotly.subplots import make_subplots
%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        

In [None]:
happiness_report = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report-2021.csv')
happiness_report = happiness_report.drop(['Standard error of ladder score', 'upperwhisker', 'lowerwhisker',
       'Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Ladder score in Dystopia'], axis = 1)


# Questions to answer? Regions and how they rank in each category 
# Any correlation between QOL and GDP? how about GDP and freedom to make life choices? GDP and corruption? 

# EDA of the Report 

In [None]:
happiness_report.describe()

In [None]:
happiness_report.head()

# Data Cleaning - Round 1 
There are certain columns that we drop:
- The explained columns are dropped as it's simply a demonstration of the 6 main factors.
- Standard error of ladder score as well as upper whisker and lower whisker are dropped as well. 
- Ladder score in Dystopia has a value of 2.43 for each country but is already reflected in Dystopia + residual.

# Correlations between factors 

In [None]:
fig_dims = (20,8)
fig, ax = plt.subplots(figsize = fig_dims)
sns.heatmap(happiness_report[['Ladder score', 'Logged GDP per capita','Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'] ].corr(), annot = True, cmap="YlGnBu" )
# need colour palette 

It would seem that there is a strong correlation between the ladder score & logged GDP per capita, social support & logged GDP per capita, healthy life expectancy and logged GDP per capita, as well as healthy life expectancy and social support. 
Something that seems rather shocking to me is the very weak correlation between the ladder score and generosity.
We will be exploring more of this below. 


# Correlation between ladder score & logged GDP per capita
We will first be looking at the correlation between ladder score & logged GDP per capita.

In [None]:
px.scatter(happiness_report, x="Ladder score", y="Logged GDP per capita", trendline="ols")

# Correlation between social support & logged GDP per capita

In [None]:
px.scatter(happiness_report, x="Social support", y="Logged GDP per capita", trendline="ols")

# Correlation between logged GDP per capita & healthy life expectancy
It is not surprising that there is a correlation between GDP per capita and healthy life expectancy, as it could be linked to things such as better healthcare infrastructure and better systems to improve quality of life like water cleanliness, etc.

In [None]:
px.scatter(happiness_report, x="Healthy life expectancy", y="Logged GDP per capita", trendline="ols")

# Correlation between social support & healthy life expectancy 
This correlation would make sense as well, as increased social support could be linked to factors such as lesser stress and increased emotional wellbeing.

In [None]:
px.scatter(happiness_report, x="Healthy life expectancy", y="Social support", trendline="ols")

# Correlation between generosity & ladder score
One correlation that does not seem to make sense is the correlation between generosity and ladder score. 

In [None]:
px.scatter(happiness_report, x="Ladder score", y="Generosity", trendline="ols")

We can tell there are quite a few outliers with high generosity values and rather low ladder scores, and vice versa.
Since the data is so varied, there is little to no correlation regarding generosity of countries and the ladder score. 

# Factor analysis
We will be looking at each factor with respect to each region in the world itself to see which region stands out in each factor.


# Logged GDP per capita
GDP per capita is the value of all the goods and services a country produces on a yearly basis divided by the population itself.
Since the value of GDP per capita is pretty big, we apply logarithms to it to get a smaller value. 

In [None]:
fig0 = px.bar(happiness_report.groupby("Regional indicator").mean()['Logged GDP per capita'], title = 'Different Regions and their logged GDP per capita').update_traces(marker=dict(color='dodgerblue'))
fig0.update_layout(xaxis={'categoryorder':'total descending'},
                                  plot_bgcolor = '#ffffff')

It would be expected that the top 3 regions - Western Europe, North America and ANZ & East Asia would have the highest GDP per capita. 

# Social Support
Social Support is defined by the existence of a social environment to rely on for help. We will be comparing the social support for different regions.


In [None]:
fig1 = px.bar(happiness_report.groupby("Regional indicator").mean()['Social support'], title = 'Different Regions and their amount of social support').update_traces(marker=dict(color='dodgerblue'))
fig1.update_layout(xaxis={'categoryorder':'total descending'},
                   plot_bgcolor = '#ffffff')

In [None]:
fig2 = px.bar(happiness_report.groupby("Regional indicator").mean()['Freedom to make life choices'], title = 'Different Regions and their freedom to make life choices').update_traces(marker=dict(color='dodgerblue'))
fig2.update_layout(xaxis={'categoryorder':'total descending'},
                                  plot_bgcolor = '#ffffff')

In [None]:
fig3 = px.bar(happiness_report.groupby("Regional indicator").mean()['Perceptions of corruption'], title = 'Different Regions and their perceptions of corruption').update_traces(marker=dict(color='dodgerblue'))
fig3.update_layout(xaxis={'categoryorder':'total descending'},
                                  plot_bgcolor = '#ffffff')

In [None]:
fig4 = px.bar(happiness_report.groupby("Regional indicator").mean()['Generosity'], title = 'Different Regions and their Generosity').update_traces(marker=dict(color='dodgerblue'))
fig4.update_layout(xaxis={'categoryorder':'total ascending'},
                                  plot_bgcolor = '#ffffff')

# Top 5 & bottom 5 of each factor 

In [None]:
top = happiness_report.sort_values(by ='Ladder score', ascending = False)[:5]
top['Top or Bottom'] = 'Top'
bottom = happiness_report.sort_values(by ='Ladder score', ascending = True)[:5]
bottom['Top or Bottom'] = 'Bottom'
top_and_bottom = pd.concat([top,bottom])
top_and_bottom = top_and_bottom.drop(['Dystopia + residual', 'Generosity'], axis = 1)
top_and_bottom

# Data Cleaning - Round 2
- The top and bottom 5 countries are split to its own datasets.
- 'Dystopia + residual' and 'Generosity' are removed from this dataset itself 
- Generosity is removed as it has little to no correlation to the ladder score as seen from the correlation above.
- 'Top or Bottom' is included as a categorical variable so as to set the color to differentiate the datasets 
- The top and bottom datasets are then concatenated for easier use.

In [None]:
fig = px.bar(top_and_bottom, x = 'Regional indicator', color = 'Top or Bottom')
fig.update_layout(title = 'Regions of the Top 5 and Bottom 5 countries ', xaxis={'categoryorder':'total descending'},
                                  plot_bgcolor = '#ffffff')
fig 
#change colour scheme, again. but at least i can kinda do it but still fuck this LMAO 

In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Ladder score', color = 'Top or Bottom')
fig.update_layout(title = 'Ladder score of the Top 5 and Bottom 5 countries ', xaxis={'categoryorder':'total descending'},
                                  plot_bgcolor = '#ffffff')
fig 

Let us dive deeper into why the countries have their ladder score. We will be exploring the individual factors.


In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Logged GDP per capita', color = 'Top or Bottom')
fig.update_layout(title = 'Logged GDP per capita of the Top 5 and Bottom 5 countries ',
                                  plot_bgcolor = '#ffffff')
fig 

It would seem that the top 5 countries have a relatively high GDP per capita, while the bottom 5 have a low GDP per capita with the exception of Botswana. 

In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Social support', color = 'Top or Bottom')
fig.update_layout(title = 'Social support of the Top 5 and Bottom 5 countries ',
                                  plot_bgcolor = '#ffffff')
fig 

The top 5 countries have relatively high levels of social support, with Iceland slightly edging out. For the bottom 5 countries, Afghanistan and Rwanda have lower social support than the other countries. 

In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Healthy life expectancy', color = 'Top or Bottom')
fig.update_layout(title = 'Life expectancy of the Top 5 and Bottom 5 countries ',
                                  plot_bgcolor = '#ffffff')
fig 

It would seem like the top 5 countries have a relatively high and even life expectancies, while the bottom 5 countries have generally lower life expectancies like Lesotho at the bottom.


In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Freedom to make life choices', color = 'Top or Bottom')
fig.update_layout(title = 'Regions of the Top 5 and Bottom 5 countries ',
                                  plot_bgcolor = '#ffffff')
fig 

The top 5 all have relatively high freedom to make life choices, but Afghanistan has a terrifyingly low freedom score. The bottom 5 countries all have varying freedom scores, with Rwanda having a rather high freedom score. 

In [None]:
fig = px.bar(top_and_bottom, x = 'Country name', y='Perceptions of corruption', color = 'Top or Bottom')
fig.update_layout(title = 'Regions of the Top 5 and Bottom 5 countries ',
                                  plot_bgcolor = '#ffffff')
fig 

This is rather interesting. Iceland has a rather high perception of corruption amongst the top 5, while most of the bottom 5 countries have high perceptions of corruption. 

# Conclusion
It would seem like the top 5 have high enough scores on the "positive factors" and relatively low scores on negative factors (with the exception of Iceland on perceptions on corruption). 
Meanwhile, each country from the bottom 5 have low scores on each factor and relatively high scores on negative factors (with the exception of Rwanda on perceptions on corruption).

# Singapore's World Happiness ladder score and factors over the years 


In [None]:
happiness_report_older = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report.csv')
singapore_1 = happiness_report_older[happiness_report_older['Country name'] == 'Singapore'] 
singapore_1 
singapore_1 = singapore_1.drop(['Positive affect','Negative affect'], axis = 1)
singapore_1 = singapore_1.fillna(0)
singapore_1

In [None]:
happiness_report['year'] = 2020
singapore_2 = happiness_report[happiness_report['Country name'] == 'Singapore']
singapore_2['Life Ladder'] = singapore_2['Ladder score']
singapore_2['Log GDP per capita'] = singapore_2['Logged GDP per capita']
singapore_2['Healthy life expectancy at birth'] = singapore_2['Healthy life expectancy']
singapore_2 = singapore_2.drop(['Dystopia + residual', 'Ladder score', 'Logged GDP per capita', 'Healthy life expectancy', 'Regional indicator'], axis = 1)
singapore_2

In [None]:
singapore = pd.concat([singapore_1, singapore_2])
singapore.reset_index(drop=True, inplace=True)
singapore['Year'] = singapore['year']
singapore = singapore.drop(['year'], axis = 1)
singapore



# Data Cleaning - Round 3
- Firstly, the titles for 2020 are changed to fit those of the original World Happiness Dataset.
- Secondly, certain things are cut off from the original dataset such as Positive affect and Negative affect
- The year is added to the 2020 dataset. 
- Lastly, the two datasets are concatenated. 

# Time Series analysis with regards to Singapore and the different factors 

In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Life Ladder']))
fig.update_layout(title='Singapore\'s ladder score over the years',
                   xaxis_title='Year',
                   yaxis_title='Ladder score',
                 colorway = ['#ee2536'],
                 plot_bgcolor = '#ffffff' )
fig

It is rather interesting that there are certain peaks with regards to Singapore's ladder score as seen in the years 2007 and 2014.
There are also certain drops to Singapore's ladder score as well as seen in 2009 and 2016. 


In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Log GDP per capita']))
fig.update_layout(title='Singapore\'s GDP over the years',
                   xaxis_title='Year',
                   yaxis_title='Log GDP per capita',
                 colorway = ['#ee2536'],
                 plot_bgcolor = '#ffffff')
fig

Singapore's GDP is on an increasing trend over the years, with the exception of 2008 and 2009 due to the effects of the 2009 Recession. 

In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Social support']))
fig.update_layout(title='Singapore\'s social support over the years',
                   xaxis_title='Year',
                   yaxis_title='Social support',
                  colorway = ['#ee2536'],
                  plot_bgcolor = '#ffffff' )
fig

Wow - a lot of variation in the social support over the years. There are some distinct lows in 2013 and 2014, and some distinct highs in 2007 and 2016.


In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Healthy life expectancy at birth']))
fig.update_layout(title='Singapore\'s life expectancy at birth over the years',
                   xaxis_title='Year',
                   yaxis_title='Life expectancy at birth',
                  colorway = ['#ee2536'],
                 plot_bgcolor = '#ffffff')
fig

An increasing trend over the years, with a light drop in 2020 here - but it's to be expected. Covid-19 had struck in 2020, and it would not be surprising that the life expectancy has slightly dropped.


In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Freedom to make life choices']))
fig.update_layout(title='Singapore\'s freedom to make life choices over the years',
                   xaxis_title='Year',
                   yaxis_title='Ladder score',
                  colorway = ['#ee2536'],
                  plot_bgcolor = '#ffffff'  )
fig

There's a slight increasing trend, but there was a big drop in 2008. It could be due to the recession in 2009 that has led to the drop in finances. 

In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Generosity']))
fig.update_layout(title='Singapore\'s generosity over the years',
                   xaxis_title='Year',
                   yaxis_title ='Generosity',
                  colorway = ['#ee2536'],
                  plot_bgcolor = '#ffffff')
fig

Generosity amongst Singaporeans have decreased over time, with spikes in 2007 and an all time low in 2011. 

In [None]:
fig = go.Figure(data=go.Scatter(x = singapore['Year'], y = singapore['Perceptions of corruption']))
fig.update_layout(title='Singapore\'s perception of corruption over the years',
                   xaxis_title='Year',
                   yaxis_title='Perception of corruption',
                  colorway = ['#ee2536'],
                  plot_bgcolor = '#ffffff'  )
fig

There were peaks in 2013 and 2017, and the perception of corruption has increased over the years.

# Conclusion with regards to Singapore's ladder score
From the analysis of the individual factors, it would seem that the dips in years are justified, such as in 2008-2009 during the recession itself. However, there are some years whereby the ladder score does not back the analysis - and it could be due to other factors (e.g. certain events happening in the year itself such as elections, etc.) 

# This is the end of the notebook. If you like my analysis, please do give it a thumbs up. I am always willing to improve on my notebook, so do let me know if there is anything I can improve on as well. Thank you! 
