# World Happiness Report 2015 - 2019 Analysis

![image](https://ichef.bbci.co.uk/news/976/cpsprodpb/A204/production/_106067414_2.jpg)

The  **World Happiness Report**, published by the Earth Institute and co-edited by the institute’s director, Jeffrey Sachs, reflects a new worldwide demand for more attention to happiness and absence of misery as criteria for government policy. It reviews the state of happiness in the world today and shows how the new science of happiness explains personal and national variations in happiness.  It contains articles and rankings of national happiness, based on respondent ratings of their own lives, which the report also correlates with various (quality of) life factors.

The report being published since 2012 to our day, giving insights about the happiness ranks of different countries around the world. The data has five csv's which includes different years' happiness rankings. 

Data is collected from people in over 150 countries. Each variable measured reveals a populated-weighted average score on a scale running from 0 to 10 that is tracked over time and compared against other countries. These variables currently include:

- real GDP per capita
- social support
- healthy life expectancy
- freedom to make life choices
- generosity
- perceptions of corruption

In this notebook, we tried to answer;

- What makes people in a county happy?

- In which countries the happiness scores changed remarkably? 

- In total, from 2015 to 2019, how did the variables change?

We hope you have a good time reading this notebook. 


# Importing Libraries

In [None]:
import os
import textwrap

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

import seaborn as sns
import missingno as msno
import plotly.express as px
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
plt.style.use('ggplot')

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Imputer
from sklearn.impute import KNNImputer

# Reading Dataset

In [None]:
def read_data():
    all_data = {}
    for dirname, _, filenames in os.walk('/kaggle/input'):
        for filename in filenames:
            path = os.path.join(dirname, filename)
            all_data[f'data_{path[-8:-4]}'] = pd.read_csv(path)
    
    all_data = {key: all_data[key] for key in sorted(all_data)}
    
    return all_data

all_data = read_data()
for name, df in all_data.items():
    print(f'{name} has {df.shape[0]} rows and {df.shape[1]} columns')

In [None]:
data2015 = all_data['data_2015']
data2016 = all_data['data_2016']
data2017 = all_data['data_2017']
data2018 = all_data['data_2018']
data2019 = all_data['data_2019']

# Data Wrangling

In [None]:
# Let's investigate the data and see if they have any similar or different columns
print('Displaying 2015 data')
display(data2015.head())
print('\nDisplaying 2016 data', '-'*100)
display(data2016.head())

In [None]:
print('Displaying 2017 data')
display(data2017.head())
print('\nDisplaying 2018 data', '-'*100)
display(data2018.head())

In [None]:
print('Displaying 2019 data')
display(data2019.head())

So,

The 2015 and 2016 data columns are similar. (except standard deviation and CI. We have a formula for to find CI (for 2015 data) and STD from CI for (2016 data).
The formula : 
SD = sqrt(N) * (upper limit - lower limit) / 3.92 

    note = 3.92 is for data bigger than 100 samples

2018 and 2019 data columns are similar also. But they don't have STD and CI's. hmm let's think about that.  Maybe we can remove CI and STD

The first thing I would like to do is rename columns with easier names :) 

Country names can stay as they are. Region also can stay- I will concatenate df's on 2015 on Country column, we don't need to do sth.
Happiness Rank = Rank, Happiness Score = Score, Std error = remove, Economy (GDP per capita) = GDP, Health (Life expectancy) = Life expectancy and on. You'll see them in the following section. No need to comment them all. Let's start to rename.


## Filtering - Data 2015 

In [None]:
data2015.columns

In [None]:
# Renaming columns of data 2015 
data2015 = data2015.rename(columns={'Happiness Rank': 'Rank',
                        'Happiness Score' : 'Score',
                        'Economy (GDP per Capita)' : 'GDP',
                        'Health (Life Expectancy)' : 'Life Expectancy',
                        'Trust (Government Corruption)': 'Trust'})

# Create a year column
data2015['Year'] = pd.to_datetime(2015, format='%Y').year

# Dropping std 
data2015 = data2015.drop(['Standard Error', 'Dystopia Residual'], axis=1)

## Filtering - Data 2016

In [None]:
data2016.columns

In [None]:
# Renaming columns of data 2016
data2016 = data2016.rename(columns={'Happiness Rank': 'Rank',
                                    'Happiness Score' : 'Score',
                                    'Economy (GDP per Capita)' : 'GDP',
                                    'Health (Life Expectancy)' : 'Life Expectancy',
                                    'Trust (Government Corruption)': 'Trust'})

# Create a year column
data2016['Year'] = pd.to_datetime(2016, format='%Y').year

# Dropping CI
data2016 = data2016.drop(['Lower Confidence Interval','Upper Confidence Interval', 'Dystopia Residual', 'Region'], axis=1)

## Filtering - Data 2017

In [None]:
data2017.columns

In [None]:
# Renaming columns of data 2017
data2017 = data2017.rename(columns={'Happiness.Rank': 'Rank',
                                    'Happiness.Score' : 'Score',
                                    'Economy..GDP.per.Capita.' : 'GDP',
                                    'Health..Life.Expectancy.' : 'Life Expectancy',
                                    'Trust..Government.Corruption.': 'Trust'})

#C reate a year column
data2017['Year'] = pd.to_datetime(2017, format='%Y').year

# Dropping CI
data2017 = data2017.drop(['Whisker.high','Whisker.low', 'Dystopia.Residual'], axis=1)

## Filtering - Data 2018

In [None]:
data2018.columns

In [None]:
# Renaming columns of data 2018
data2018 = data2018.rename(columns = {'Overall rank' : 'Rank',
                                     'Country or region' : 'Country',
                                     'GDP per capita' : 'GDP',
                                     'Social support' : 'Family',
                                     'Healthy life expectancy' : 'Life Expectancy',
                                     'Freedom to make life choices' : 'Freedom',
                                     'Perceptions of corruption': 'Trust'})

# Create a year column
data2018['Year'] = pd.to_datetime(2018, format='%Y').year

## Filtering - Data 2019

In [None]:
data2019.columns

In [None]:
# Renaming columns of data 2019
data2019 = data2019.rename(columns = {'Overall rank' : 'Rank',
                                     'Country or region' : 'Country',
                                     'GDP per capita' : 'GDP',
                                     'Social support' : 'Family',
                                     'Healthy life expectancy' : 'Life Expectancy',
                                     'Freedom to make life choices' : 'Freedom',
                                     'Perceptions of corruption': 'Trust'})

# Create a year column
data2019['Year'] = pd.to_datetime(2019, format='%Y').year

In [None]:
print('Displaying 2015 data')
display(data2015.head())

print('\nDisplaying 2016 data', '-'*100)
display(data2016.head())

print('\nDisplaying 2017 data', '-'*100)
display(data2017.head())

print('\nDisplaying 2018 data', '-'*100)
display(data2018.head())

print('\nDisplaying 2019 data', '-'*100)
display(data2019.head())

## Concatenating Dataset

In [None]:
# Merging all the dataset into one dataset
happiness= pd.concat([data2015, data2016, data2017, data2018, data2019], 
                     ignore_index=True)

In [None]:
happiness

## In the Search of Missings

Let's see if the data has any missings.

Also see the scatters, descriptive table; all features has minimum scores == 0
We will replace 0's with np.NaN, and impute them with KNN. 

In [None]:
msno.matrix(happiness)
plt.show()

In [None]:
happiness.isna().sum()

In [None]:
# Forward fill missings in Region by grouping them by country
happiness['Region'] = happiness.groupby('Country')['Region'].fillna(method='ffill')

# KNNimputer for Trust missing
imputer = KNNImputer()
happiness.iloc[:,2:11] = imputer.fit_transform(happiness.iloc[:,2:11])

In [None]:
happiness.isna().sum()

There are still 25 Regions which have NA value. We will use the following dictionary to fill remaining NA values.

In [None]:
map = {'Belize': 'Latin America and Caribbean', 'Gambia': 'Sub-Saharan Africa', 
       'Hong Kong S.A.R., China': 'Eastern Asia', 'Namibia': 'Sub-Saharan Africa', 
       'North Macedonia': 'Central and Eastern Europe', 'Northern Cyprus': 'Middle East and Northern Africa', 
       'Puerto Rico': 'Latin America and Caribbean', 'Somalia': 'Sub-Saharan Africa', 
       'Somaliland Region': 'Sub-Saharan Africa', 'South Sudan': 'Sub-Saharan Africa', 
       'Taiwan Province of China': 'Southeastern Asia', 'Trinidad & Tobago': 'Latin America and Caribbean'}

for index, row in happiness.iterrows():
    if happiness.loc[index, 'Country'] in map:
        new_value = map[happiness.loc[index, 'Country']]
        happiness.loc[index, 'Region'] = new_value

In [None]:
happiness.isna().sum()

Now see zeros in the data.

In [None]:
happiness[happiness['GDP']==0]

In [None]:
happiness[happiness['Trust']==0]

In [None]:
happiness[happiness['Family']==0]

In [None]:
happiness[happiness['Freedom']==0]

In [None]:
happiness[happiness['Generosity']==0]

As you can see we have zeros; lets change them to NaN's.

In [None]:
# Replacing 0's with nans
happiness = happiness.replace(0, np.nan)
happiness.isna().sum()

In [None]:
# Impute missings groupedby Year,
happiness.iloc[:,2:10] = imputer.fit_transform(happiness.iloc[:,2:10])
happiness.describe().T

We believe, it is better now. 

# Analysing Dataset Functions

In [None]:
def get_heatmap(data: pd.DataFrame, year: str):  
    fig, ax = plt.subplots(figsize=(11, 8)) 
    sns.heatmap(data.corr(), annot=True, fmt='.2f', linewidths=3, cmap='coolwarm',
              ax=ax, annot_kws={'size': 12, 'color':'black'})
    ax.set_title('Data - ' + year, fontsize=15, fontweight='bold', pad=5)
    plt.xticks(rotation=45, weight='bold')
    plt.yticks(weight='bold')
    plt.show()
    
    
def get_vif(dataframe: pd.DataFrame) -> pd.DataFrame:
    """
    This function calculated VIF of the given dataframe, 
    returns Variance Inflation Factor as a dataframe.
    """
    dataframe = dataframe._get_numeric_data()
    vif_df = pd.DataFrame()
    vif_df['columns'] = dataframe.columns
    vif_df['VIF Value'] = [variance_inflation_factor(dataframe.values, i) for i in range(dataframe.shape[1])]
    return(vif_df)


def subplot_score(data: pd.DataFrame, year: str):
    fig, ax = plt.subplots(2, 3, figsize=(18, 8))

    plot_columns = ['GDP', 'Family', 'Life Expectancy', 'Freedom', 'Trust', 'Generosity']
    plot_color = ['red', 'green', 'blue', 'purple', 'yellow', 'orange']

    for i in range(6):
        m = i // 3
        n = i % 3

        ax[m, n].scatter('Score', plot_columns[i], data=data, color=plot_color[i],
                    marker='o')
        ax[m, n].set_xlabel('Score', fontweight='bold') 
        ax[m, n].set_ylabel(plot_columns[i], fontweight='bold')

    fig.suptitle('Score variable in data ' + year, fontsize=16)
    plt.show()
    
    
def get_seaborn_bar(data: pd.DataFrame, y: str, plot_title: str, 
                    plot_color:str = 'Paired', title_color: str = 'black'):
    fig, ax = plt.subplots(1, 1, figsize=(15, 7))

    plot = sns.barplot(ax=ax, x=data['Country'], y=data[y],
                     palette=sns.color_palette(plot_color, data.shape[0]))

    for index, row in data.iterrows():
        plot.text(x=index, y=row[y]*1.01, s=round(row['Score'], 2), 
                  ha='center', color='black')

    ax.set_title(plot_title, fontdict={'fontweight':'bold', 'color':title_color})
    ax.set_xticklabels(textwrap.fill(x.get_text(), 7, subsequent_indent='-') for x in ax.get_xticklabels())
    ax.set_xlabel('Country', fontweight='bold')
    ax.set_ylabel(y, fontweight='bold')
    return ax

# Analysis of Data - 2015


In [None]:
# We imputed 0's, so time to edit the data: 2015
data2015 = happiness[happiness['Year']==2015]

In [None]:
# Checking data types of all the columns
data2015.dtypes

In [None]:
data2015.describe().T

In [None]:
get_heatmap(data=data2015.drop('Year', axis=1), year='2015')

In [None]:
get_vif(data2015.drop(['Rank','Score', 'Year'], axis=1))

In [None]:
subplot_score(data=data2015, year='2015')

The scatterplots show us GDP, Family Support, Life Expectancy, and Freedom are highly correlated with Happiness scores of the countries. 

Further more, We see that Trust follows a trend in happy countries. We would like to analyze that.

In [None]:
# Dividing dataset 2015 on the basis on score
unhappy_2015 = data2015[data2015['Score'] < 5]
happy_2015 = data2015[data2015['Score'] >= 5]

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(17, 8))

plot_1 = sns.regplot(x='Score', y='Trust', data=unhappy_2015, 
                     ax=ax[0], line_kws={"color": "red"}, 
                     scatter_kws={"color": "blue"})
ax[0].set_title('Unhappy Dataset (Score < 5)', fontweight='bold')
ax[0].set_xlabel('Happiness Score') 
ax[0].set_ylabel('Government Trust Score')

plot_2 = sns.regplot(x='Score', y= 'Trust', data= happy_2015, 
                     ax=ax[1], line_kws={"color": "red"},
                     scatter_kws={"color": "blue"})
ax[1].set_title('Happy Dataset (Score >= 5)', fontweight='bold')
ax[1].set_xlabel('Happiness Score') 
ax[1].set_ylabel('Government Trust Score')

fig.suptitle('Investigation of the Relationship Between Government Trust and Happiness', fontweight='bold')
plt.show()

As we thought, We found a good relationship between Happiness and Government Trust in happy countries. As you can see from the regplot, if we narrow the happiness score from 6.0 maybe we can find a better regression line.

Let's do the same for generosity

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(17, 8))

plot_1 = sns.regplot(x='Score', y='Generosity', data=unhappy_2015, 
                     ax=ax[0], line_kws={"color": "red"}, 
                     scatter_kws={"color": "blue"})
ax[0].set_title('Unhappy Dataset (Score < 5)', fontweight='bold')
ax[0].set_xlabel('Happiness Score') 
ax[0].set_ylabel('Generosity')

plot_2 = sns.regplot(x='Score', y= 'Generosity', data= happy_2015, 
                     ax=ax[1], line_kws={"color": "red"},
                     scatter_kws={"color": "blue"})
ax[1].set_title('Happy Dataset (Score >= 5)', fontweight='bold')
ax[1].set_xlabel('Happiness Score') 
ax[1].set_ylabel('Generosity')

fig.suptitle('Investigation of the Relationship Between Generosity and Happiness', fontweight='bold')
plt.show()

It is nearly the same with Trust. People living in happy countries trusting their governments more than un_happy countries. However, this may be vise versa;
If you trust your governemnt, and with GDP you have, you are generous you are happier.

## Region Wise Happiness Score in 2015

In [None]:
region_mean = data2015.groupby('Region')['Score'].mean().sort_values(ascending=False)
fig, ax = plt.subplots(1, 1, figsize=(15, 7))
plot = sns.barplot(x=region_mean.index, y=region_mean, ax=ax, palette=sns.color_palette("RdYlGn_r", len(region_mean)))
i = 0
for index, row in region_mean.iteritems():
    plot.text(x=i, y=row*1.01, s=round(row, 2), 
            ha='center', color='black')
    i += 1

ax.set_title('Region Wise Mean Happiness Score', fontdict={'fontweight':'bold'})
ax.set_xlabel('Region', fontweight='bold')
ax.set_ylabel('Score', fontweight='bold')
ax.set_xticklabels(textwrap.fill(x.get_text(), 10, subsequent_indent='-') for x in ax.get_xticklabels())
plt.show()

As per the above barplot, Australia, New Zealand and North America have the highest happiness score while Sub Saharan Africa has lowest happiness score in 2015. 

## Who is the Happiest and least Happy in the World in 2015

In [None]:
# Visualization of the top 15 happy countries in 2015
top_happy_countries = data2015.sort_values('Score', ascending= False).head(n=15)
ax = get_seaborn_bar(data=top_happy_countries, y='Score', 
                     plot_title='Top 15 Happy Countries in the World', 
                     plot_color='hls', title_color='green')
plt.show()

In [None]:
# Visualiation of the top 15 sad countries in 2015
top_sad_countries = data2015.sort_values('Score', 
                                         ascending= False).tail(n=15)[::-1].reset_index(drop=True)
ax = get_seaborn_bar(data=top_sad_countries, 
                     y='Score', 
                     plot_title='Top 15 Sad Countries in the World', 
                     plot_color='husl', title_color='red')
plt.show()

Nordic countries, Canada, New Zeland and Australia are the happiest countries in the world. On the other hand, African and Easian countries which suffer a lot from poverty or wars (Syria) are the unhappiest countries of all.

## Can Money Buy Happiness?

In [None]:
# Visualization of the top 15 rich countries in 2015
richests = data2015.sort_values('GDP',ascending=False).head(n=15).reset_index(drop=True)
ax = get_seaborn_bar(data=richests, y='GDP', plot_title='Happiness Scores of the Top 15 Richest Countries in the World', 
                     plot_color='hls', title_color='green')
ax.annotate('Happiness Score of Country', xy=(14, 1.4), xytext=(10, 1.6),
            arrowprops=dict(arrowstyle='simple',
                            facecolor='black', 
                            connectionstyle='angle3,angleA=0,angleB=90'), 
                            bbox=dict(boxstyle='round, pad=0.7', 
                                      facecolor='w', edgecolor='black'))
plt.show()

In [None]:
# Visualization of the top 15 poor countries in 2015
poorests = data2015.sort_values('GDP',ascending=True).head(n=15).reset_index(drop=True).reset_index(drop=True)
ax = get_seaborn_bar(data=poorests, y='GDP', plot_title='Happiness Scores of the Top 15 Poorest Countries in the World', 
                     plot_color='husl', title_color='red')
ax.annotate('Happiness Score of Country', xy=(7, 0.2), xytext=(2.5, 0.23),
            arrowprops=dict(arrowstyle='simple',
                            facecolor='black', 
                            connectionstyle='angle3,angleA=0,angleB=90'), 
                            bbox=dict(boxstyle='round, pad=0.7', 
                                      facecolor="w", edgecolor='black'))
plt.show()

Money buys happiness, globally :)

Let's investigate both bar charts. We know this is an extra job, because Scatter plots gave us the relations between Score and GPD.

For the richest, we see they have a happiness score of minimum 5.47 but as you can see this is an outlier.  The next country has a score of 6.29. While the happiest country has a score of 7.49 we can say money can bring you a good amount of happiness. But it is obvious that you can't rely your happiness on money only.

For the unwealthy countries, some interesting outcomes we have here: 

Congo - Kinshasa, has a happiness score of 4.52 and Malawi, the 3rd, has 4.29. The happiest among the unwealthy countries is Somali with a score of 5.06. This bar chart supports our discussion that money effects happiness but you cannot rely everything on money. But keep it in mind that money has a good effect on happiness.


# Analysis of Data - Remaining Years (2016 - 2019)

We have analysed data 2015 and found some interesting patterns and relation between features. Now, we will analyse the data of 2016-2019 years with plotly library.

First we will see heatmaps and then top happy, sad, rich and poor countries in each year.

In [None]:
# Assign new data
data2016 = happiness[happiness['Year'] == 2016]
data2017 = happiness[happiness['Year'] == 2017]
data2018 = happiness[happiness['Year'] == 2018]
data2019 = happiness[happiness['Year'] == 2019]

In [None]:
get_heatmap(data2016.drop('Year', axis=1), '2016')

In [None]:
get_heatmap(data2017.drop('Year', axis=1), '2017')

In [None]:
get_heatmap(data2018.drop('Year',axis=1), '2018')

In [None]:
get_heatmap(data2019.drop('Year', axis=1), '2019')

As expected. For all years happiness relies on GDP , family and life expectancy. However, as you can see GDP and Life expectancy has a high correlation and high variance inflation factor score. We might not be sure about which one effects happiness the most. 

Now let's plot happiest and unhappiest countries:

In [None]:
px.scatter(data_frame = happiness,
           x = 'GDP', 
           y = 'Score', 
           animation_frame = 'Year',
           animation_group = 'Country',
           size = 'Score', 
           color = 'Country', 
           hover_name = 'Rank', 
           title = 'Happiness Scores vs GDP')

In [None]:
# Creating data frames to plot top happy countries over the years
top_happy_countries_2015 = data2015.sort_values('Score', 
                                                ascending=False).head(n=15)

top_happy_countries_2016 = data2016.sort_values('Score', 
                                                ascending=False).head(n=15)

top_happy_countries_2017 = data2017.sort_values('Score', 
                                                ascending=False).head(n=15)

top_happy_countries_2018 = data2018.sort_values('Score', 
                                                ascending=False).head(n=15)

top_happy_countries_2019 = data2019.sort_values('Score', 
                                                ascending=False).head(n=15)

all_happy_countries = pd.concat([top_happy_countries_2015, 
                                 top_happy_countries_2016,
                                 top_happy_countries_2017, 
                                 top_happy_countries_2018,
                                 top_happy_countries_2019], ignore_index=True)

# Visualization of the top happy countries
px.bar(data_frame = all_happy_countries,
       x = 'Country', 
       y = 'Score', 
       animation_frame = 'Year', 
       color = 'GDP',
       title = 'Top 15 Happy Countries and Their GDP per year')

Over the years, Denmark always occupied one of the top 3 positions for happiest countries. Switzerland came down one position every year. Finland occupied 5th position in 2016 and 2017 year but jumped to 1st position for both 2018 and 2019 year which is a significant improvement. Also, Luxembourg was not in the top 15 for 2016-2018 year but in 2019 it came in the top 15 happiest countries. 


In [None]:
# Creating data frames to plot top sad countries over the years
top_sad_countries_2015 = data2015.sort_values('Score', 
                                         ascending=False).tail(n=15)[::-1].reset_index(drop=True)

top_sad_countries_2016 = data2016.sort_values('Score', 
                                         ascending=False).tail(n=15)[::-1].reset_index(drop=True)

top_sad_countries_2017 = data2017.sort_values('Score', 
                                         ascending=False).tail(n=15)[::-1].reset_index(drop=True)

top_sad_countries_2018 = data2018.sort_values('Score', 
                                         ascending=False).tail(n=15)[::-1].reset_index(drop=True)

top_sad_countries_2019 = data2019.sort_values('Score', 
                                         ascending=False).tail(n=15)[::-1].reset_index(drop=True)

all_sad_countries = pd.concat([top_sad_countries_2015,top_sad_countries_2016,
                      top_sad_countries_2017, top_sad_countries_2018,
                      top_sad_countries_2019], ignore_index=True)

# Visualization of the top sad countries
px.bar(data_frame = all_sad_countries, 
       x = 'Country', 
       y = 'Score', 
       animation_frame = 'Year', 
       color = 'GDP', 
       title = 'Top 15 Sad Countries and Their GDP per year')

Burundi for the time period 2016-2018 was either last or second last in least happiest countries but in 2019 it jumped to 12th position from last which is noteworthy advancement. 

Syria, made it to a slightly higher place. But as you can see, its GPD decreased over time.

Afghanistan had many ups and downs. In 2016, it was at 4th last position and then jumped to 15th last position in the next year only. Since then, Afghanistan is coming down and further down which is very dreadful. 

Uganda, Burkina Faso and Chad did appear in 2016 barplot but then vanished from 2017-2019 barplot which is a remarkable boost. Berlin, Togo and Guinea did pop up in bar plots of 2016 and 2017 but they faded away in bar plots of 2018 and 2019 which is again incredible.

In [None]:
# Creating data frames to plot top rich countries over the years
richests_2015 = data2015.sort_values('GDP',
                                     ascending=False).head(n=15).reset_index(drop=True)

richests_2016 = data2016.sort_values('GDP',
                                     ascending=False).head(n=15).reset_index(drop=True)

richests_2017 = data2017.sort_values('GDP',
                                     ascending=False).head(n=15).reset_index(drop=True)

richests_2018 = data2018.sort_values('GDP',
                                     ascending=False).head(n=15).reset_index(drop=True)

richests_2019 = data2019.sort_values('GDP',
                                     ascending=False).head(n=15).reset_index(drop=True)

all_rich_countries = pd.concat([richests_2015, richests_2016, richests_2017,
                      richests_2018, richests_2019], ignore_index=True)

# Visualization of the top rich countries
px.bar(data_frame = all_rich_countries, 
       x = 'Country', 
       y = 'Score', 
       animation_frame = 'Year',
       color = 'Score', 
       title = 'Happiness Scores of the Richest Countries')

Among the rich countries, only Honk Kong seems to be slighlt unhappy. This finding may require extra investigation.  Apart from Hong Kong, Northern America, Arabic Oil Rich countries, and Northern European countries are in the top 15 for all years. Bahrain showed itself on the list in 2017 and then dissepeared after.

Also we observe, rich countries have a happiness score at least 6.00.

In [None]:
# Creating data frames to plot top poor countries over the years
poorests_2016 = data2016.sort_values('GDP',
                                      ascending=True).head(n=15).reset_index(drop=True)

poorests_2015 = data2015.sort_values('GDP',
                                      ascending=True).head(n=15).reset_index(drop=True)

poorests_2017 = data2017.sort_values('GDP',
                                      ascending=True).head(n=15).reset_index(drop=True)

poorests_2018 = data2018.sort_values('GDP',
                                      ascending=True).head(n=15).reset_index(drop=True)

poorests_2019 = data2019.sort_values('GDP',
                                      ascending=True).head(n=15).reset_index(drop=True)

all_poor_countries = pd.concat([poorests_2015,poorests_2016, poorests_2017,
                       poorests_2018, poorests_2019], ignore_index=True)

# Visualization of the top rich countries
px.bar(data_frame = all_poor_countries, 
       x = 'Country', 
       y = 'Score', 
       animation_frame = 'Year',
       color = 'Score', 
       title = 'Happiness Scores of Poorest Countries')

As in the rich countries, we have countries that can be accepted as semi-happy which are poor (Somaliland Region for instance). And again, as in the rich countries, these values seem to be outliers. Generally speaking, the countries with a low GPD have a happiness score in range 2.8 to 4.5.

# Time Series Analysis

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(15, 7))
sns.boxplot(data = happiness.drop(['Rank', 'Year', 'Score'], axis=1), ax=ax)
plt.show()

In [None]:
from scipy.stats import zscore
z = np.abs(zscore(happiness._get_numeric_data(), axis=0, ddof=0, nan_policy='omit'))
happiness._get_numeric_data()[(z<3).all(axis=1)]

In [None]:
no_outliers = happiness._get_numeric_data()[(z<3).all(axis=1)]
no_outliers['Year'].value_counts()

## World Maps for Happiness Scores

In [None]:
# World map for happiness score over the years
fig = px.choropleth(data_frame = happiness, 
                    locations = 'Country',
                    locationmode = 'country names',
                    animation_frame ='Year',
                    color = 'Score',
                    hover_name = 'Country',
                    color_continuous_scale = px.colors.sequential.Plasma)
fig.show()

In [None]:
# World map for GDP in over the years
fig = px.choropleth(happiness, 
                    locations = 'Country',
                    locationmode = 'country names',
                    animation_frame = 'Year',
                    color = 'GDP',
                    hover_name = 'Country',
                    color_continuous_scale = px.colors.sequential.Plasma)
fig.show()

## Score change over the years

Now I want to investigate the changes to happiness scores by years. I will only extract 2015 happiness score from 2019 happiness, and investigate countries which are drastically decreased, if there are any.


In [None]:
columns = ['Country', 'Score', 'Year']
happiness_scores = happiness.loc[:, columns]
score_diiference = happiness_scores.groupby('Country')['Score'].agg(['first','last'])
score_diiference['difference'] = score_diiference['first'] - score_diiference['last']
score_diiference.sort_values('difference', inplace=True)

In [None]:
# Top five and last five differences
score_diiference.iloc[np.r_[0:5, -5:0]]

Venezuela and Lesotho got sad by time by a good amount. 

I would like to investigate happiness scores of each country in each year.

In [None]:
pos_dif_coun = score_diiference.head(n=4).index
neg_dif_coun = score_diiference.tail(n=4).index

fig, ax = plt.subplots(1, 2, figsize=(15, 8))
for pos_con, neg_con in zip(pos_dif_coun, neg_dif_coun):
    sns.lineplot(x=happiness_scores[happiness_scores['Country']==pos_con]['Year'],
               y=happiness_scores[happiness_scores['Country']==pos_con]['Score'], 
               label=pos_con, ax=ax[0], marker='o')

    sns.lineplot(x=happiness_scores[happiness_scores['Country']==neg_con]['Year'],
               y=happiness_scores[happiness_scores['Country']==neg_con]['Score'], 
               label=neg_con, ax=ax[1], marker='o')

ax[0].set_title('Postive Happiness Score Changes', fontdict={'fontweight':'bold', 'color':'green'})
ax[1].set_title('Negative Happiness Score Changes', fontdict={'fontweight':'bold', 'color':'red'})

for i in range(2):
    ax[i].xaxis.set_major_locator(MaxNLocator(integer=True))
    ax[i].set_xlabel('Year', fontweight='bold')
    ax[i].set_ylabel('Score', fontweight='bold')
    ax[i].legend(fancybox=True, framealpha=1, shadow=True, borderpad=1)
    ax[i].grid(False)

plt.show()

## Transformation of Features

In [None]:
columns = ['GDP','Family', 'Life Expectancy', 'Freedom', 'Trust', 'Generosity']
yearwise_mean = happiness.groupby(by=['Year'])[columns].mean()
yearwise_mean

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(13, 9))
for column in columns:
    plot = sns.lineplot(x=yearwise_mean.index, y=yearwise_mean[column], ax=ax, 
               label=column, marker='o')

    for index, value in yearwise_mean[column].iteritems():
        plot.text(x=index, y=round(value, 2)*1.02, s=round(value, 2), 
                  ha='center', color='black')

ax.set_title('Changes of Features Over The Years', fontweight='bold')
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_xlabel('Year', fontweight='bold')
ax.set_ylabel('Mean Value', fontweight='bold')
ax.legend(fancybox=True, framealpha=1, shadow=True, borderpad=1, 
          loc='center left', bbox_to_anchor=(1, 0.5))
ax.grid(False)
plt.show()

Over the time all the above features have changed. 'GDP' increased for the first few years which shows economic growth but in 2018 there is a recession from which the world made recovery in the next year. 'Family' feature didn't have a nice start and showed a decline by 0.197 in the very first year. But it made a huge recovery in 2017 by 0.395 and showed positive movement till the end. Feature 'Life Expectancy' declined in the first few years but made immense recovery in two years only. That may be possible because of improvement in medical care, education, lifestyle and many more. 'Freedom' feature had many ups and downs. It first shows decline and then incline and then again decline. Change in feature 'Trust' tells us that it needs priority because it is continuously declining. Feature 'Generosity' shows a very similar pattern to GDP as it increased for the first few years, then declined for a year and then showed a small rise.

# Conclusion

There are many factor that affects happiness of citizens of a country. In the data we had, with the given features we could assume that mainly GDP, Family support and Life expectancy affects happiness of a country. In addition to that Trust scores of the happy countries is correlated with the happiness scores. These findings made us think of one simple explaination. 

Our explanation for that is simple : **Maslow's Hierarchy of Needs**

![image](https://www.simplypsychology.org/maslow-needs2.webp)

[Image Credit](https://www.simplypsychology.org/maslow.html)

According to Maslow's hierarchy, If you can find yourself food, water, safe place to sleep , protect yourself from any kind of harms , you can start to think about others. These 'others' are firstly your loved ones - family, significant other, children etc. If you think you provided safety, love and other things for your primary chain you can think about Self-fulfillment needs.  
At this point, an individual now thinks about self-actualiziation. We, humans, do this by many different ways, and generosity is one of the ways of people's self-actualiziation. 

For unhappy countries, these counties are from Africa, Middle East. They are dealing with poverty, hunger, safety issues if not dealing with civil wars. So, it is thought that people in these countries have no opportunity to think about other but themselves and family. 