# In Search Of Happiness: When Is It?

The happiness challenge on the go!  
We have already answered some questions about happiness. You can read about this and much more [here](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00002_world_happiness/world_happiness.map.ipynb).  
Today we wondered when we are becoming happier.  
We'll take [the World Happiness report from Kaggle](https://www.kaggle.com/unsdsn/world-happiness?select=2018.csv), which ranks 156 countries by their level of happiness on a 10-point scale.


## The World Happiness Report

Recall quoting Kaggle:

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

You can read more [here](https://www.kaggle.com/unsdsn/world-happiness).

We are most interested in the following columns:

- `Country or region` - country name
- `Overall rank` - country's place in the rating
- `Score` - happiness score


## Introduction

We'll try to identify the relationship between the level of happiness and the age of the population by country.

[The World Factbook](https://www.cia.gov/library/publications/the-world-factbook) by CIA provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities.  
For our purposes, we will take the following indicators:

- `Life expectancy at birth` - the average number of years to be lived by a group of people born in the same year, if mortality at each age remains constant in the future. Life expectancy at birth is also a measure of overall quality of life in a country and summarizes the mortality at all ages.  
- `Median age` - the age that divides a population into two numerically equal groups; that is, half the people are younger than this age and half are older. It is a single index that summarizes the age distribution of a population. Currently, the median age ranges from a low of about 15 in Niger and Uganda to 40 or more in several European countries and Japan.  
- `Population growth rate` - the average annual percent change in populations, resulting from a surplus (or deficit) of births over deaths and the balance of migrants entering and leaving a country. The rate may be positive or negative.
- `Death rate` - the average annual number of deaths during a year per 1,000 population at midyear; also known as crude death rate.
- `Birth rate` - the average annual number of births during a year per 1,000 persons in the population at midyear; also known as crude birth rate.

We will compare the happiness scores with the CIA rates for 2018, as this is the year when the data is presented in the most complete way.  

Let's find out ***when*** *is happiness*.


## Reading The Data

In [None]:
# Import libs
from glob import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Happiness Reports

As we mentioned above, we'll read the happiness report published in 2018.  
We'll also rename the columns according to the snake_case format.

In [None]:
# Set the width to display
pd.set_option('display.width', 120)
# Increase the number of rows to display
pd.set_option('display.max_rows', 60) 

# Get the data
happiness = pd.read_csv('/kaggle/input/world-happiness/2018.csv')

# Column map to rename
cols_dict = {'Country':'country',
             'Country or region':'country',
             'Region':'region',
             'Happiness Rank':'rank',
             'Happiness.Rank':'rank',
             'Overall rank':'rank',
             'Happiness Score':'score',
             'Happiness.Score':'score',
             'Score':'score',
             'Economy (GDP per Capita)':'gdp_per_capita',
             'Economy..GDP.per.Capita.':'gdp_per_capita',
             'GDP per capita':'gdp_per_capita',
             'Family':'family',
             'Freedom':'freedom',
             'Freedom to make life choices':'freedom',
             'Generosity':'generosity',
             'Health (Life Expectancy)':'life_expectancy',
             'Health..Life.Expectancy.':'life_expectancy',
             'Healthy life expectancy':'life_expectancy',
             'Perceptions of corruption':'trust_corruption',
             'Trust (Government Corruption)':'trust_corruption',
             'Trust..Government.Corruption.':'trust_corruption',
             'Social support':'social_support',
             'Dystopia Residual':'dystopia_residual',
             'Dystopia.Residual':'dystopia_residual',
             'Standard Error':'standard_error',
             'Upper Confidence Interval':'whisker_high',
             'Whisker.high':'whisker_high',
             'Lower Confidence Interval':'whisker_low',
             'Whisker.low':'whisker_low'
            }

# Rename the columns
happiness.rename(columns=cols_dict, inplace=True)

print(happiness.columns) # check the new column names
happiness.head() # check the values

In [None]:
happiness.info()

We see 156 countries in the report of 2018. There are no missing values for the `country`, `rank`, `score` columns.

Let's check for duplicates.

In [None]:
# Duplicated
print('Duplicated: {}'.format(happiness.duplicated(subset='country').sum()))

It's OK. Let's get the CIA data.

### CIA Reports

We have downloaded The World Factbook archive for different years and saved the data that was collected in 2018.

In [None]:
cia_files = glob('/kaggle/input/the-world-factbook-by-cia/cia.age.*.2018.txt')

cia = pd.DataFrame()

for file in cia_files:
    c = pd.read_csv(file,
                    engine='python', sep=r'\s{3,}', header=None,
                    names=['country_cia', file.split('.')[2], 'data_year'],
                    squeeze=False, skiprows=1, index_col=0,
                    thousands=',', dtype={file.split('.')[2]:'float64'}
                   )[['country_cia', file.split('.')[2]]] # read the file
    if cia.size == 0:
        cia = cia.append(c)
        print('Initialize {}: {}'.format(file.split('.')[2], cia.shape[0])) # for the first file
    else:
        cia = cia.merge(c, on='country_cia', how='outer')
        print('Merge {}: {}'.format(file.split('.')[2], cia.shape[0]))

cia.reset_index()

cia.info()
cia

It is interesting to see what the median age and life expectancy in the world are.

In [None]:
cia.describe()

Life expectancy today is about 73 years. The average age of a modern person is about 30 years.  
What are the countries with the lowest life expectancy at birth and median age?  
Let's define TOP5 ratings.

In [None]:
# Print the countries with the min life expectancy at birth
print('TOP5 countries with the min life expectancy at birth')
cia[['country_cia', 'life_expectancy_at_birth']].sort_values(by='life_expectancy_at_birth', ascending=True).head()

In [None]:
# Print the countries with the min median age
print('TOP5 countries with the min median age')
cia[['country_cia', 'median_age']].sort_values(by='median_age', ascending=True).head()

What are the countries with the highest life expectancy at birth and median age?

In [None]:
# Print the countries with the max life expectancy at birth
print('TOP5 countries with the max life expectancy at birth')
cia[['country_cia', 'life_expectancy_at_birth']].sort_values(by='life_expectancy_at_birth', ascending=False).head()

In [None]:
# Print the countries with the max median age
print('TOP5 countries with the max median age')
cia[['country_cia', 'median_age']].sort_values(by='median_age', ascending=False).head()

## Preparing The Data Sets

Now we should combine the `happiness` and `cia` datasets.

First, we need to check the columns that will be used for merging.
Country names may differ in data sets, for instance, `eSwatini` and `Swaziland`, `Trinidad and Tobago` and `Trinidad & Tobago`. In this case, the rows will not match.

Before, we store the `country_cia` column of the `cia` in a new column `country`.

In [None]:
cia['country'] = cia['country_cia']

Let's compare the `country` columns of the `happiness` data set and the `cia` data set.
To do this, we'll combine two data sets using `outer` join.

In [None]:
happiness_cia = happiness.merge(cia, on='country', how='outer')[['country', 'score', 'population_growth']]

pd.set_option('display.max_rows', 100) # increase the number of rows to display
happiness_cia[happiness_cia.isnull().any(axis=1)].sort_values(by=['score', 'country']) # the countries don't match

To provide the same country names:

- Create a dictionary mapping all names to the values in the `happiness` dataset since we explore the happiness data.
- Rename the countries in the `cia` dataset by replacing the values according to the map dictionary.

In [None]:
# Countries map to rename
country_to_rename = {'Cote d\'Ivoire':'Ivory Coast',
                     'Congo, Republic of the':'Congo (Brazzaville)',
                     'Congo, Democratic Republic of the':'Congo (Kinshasa)',
                     'Burma':'Myanmar',
                     'Korea, South':'South Korea',
                     'Czechia':'Czech Republic',
                     'Trinidad and Tobago':'Trinidad & Tobago',
                     'Korea, South':'South Korea'
                    }
# Rename the countries
cia['country'].replace(country_to_rename, inplace=True)

cia.sample(5, random_state=11) # check the values randomly

We can now use the `country` column to merge the two datasets.

In [None]:
happiness_cia = happiness.merge(cia, on='country', how='left').copy()

happiness_cia.info()

In [None]:
happiness_cia.sort_values(by='median_age', ascending=False)

Unfortunately, the happiness score is not defined for the country with the highest median age, Monaco. We'll have to exclude this country from consideration. However, this doesn't prevent us from high level evaluating the relationships between the indicators.  
Let's build a correlation matrix by columns.

In [None]:
# Select the columns of interest
cols_corr = ['country', 'score',
             'life_expectancy_at_birth',
             'median_age', 'birth',
             'population_growth', 'death'
            ]
happiness_cia = happiness_cia[cols_corr]
happiness_cia

In [None]:
# Get correlation matrix
happiness_cia_corr = happiness_cia.corr()
happiness_cia_corr

We can see relationships between some parameters. 
Let's plot the most interesting of them.

## Life Expectancy at Birth And Happiness

In [None]:
%matplotlib inline

# Turn on svg rendering
%config InlineBackend.figure_format = 'svg'

# Color palette for the blog
snark_palette = ['#e0675a', # red
                 '#5ca0af', # green
                 '#edde7e', # yellow
                 '#211c47' # dark blue
                ]

In [None]:
# Color palette for the data
palette = 'summer_r'

# Inscriptions
title = """The Relationship Between Life Expectancy at Birth And Happiness"""
description = """
Correlation of the life expectancy at birth with the happiness score by country based on 2018 data.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette=snark_palette,
        rc={'xtick.major.size': 4, 'ytick.major.size':4,
            'axes.spines.left': True, 'axes.spines.bottom': True,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
fig = plt.figure(figsize=figsize, facecolor='w')
ax = sns.scatterplot(x='life_expectancy_at_birth', y='score',
                     hue=happiness_cia['score'].tolist(),
                     size=happiness_cia['life_expectancy_at_birth'].tolist(),
                     sizes=(10,100),
                     data=happiness_cia,
                     palette=palette, legend=False
                    )

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.1, -0.015), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.spines['bottom'].set_linestyle((0, (1, 10)))
ax.spines['bottom'].set_color(snark_palette[-1])
ax.spines['left'].set_linestyle((0, (1, 10)))
ax.spines['left'].set_color(snark_palette[-1])
ax.set_xlabel('Life Expectancy at Birth', horizontalalignment='center', size='x-large', c=snark_palette[-1]) # set label of x axis
ax.set_xticks([i for i in range(50, 100, 10)])
ax.set_xticklabels([i for i in range(50, 100, 10)], c=snark_palette[-1])
ax.set_ylabel('Score', horizontalalignment='center', size='x-large', c=snark_palette[-1]) # set label of y axis
ax.set_yticks([i for i in range(2, 9)])
ax.set_yticklabels([i for i in range(2, 9)], c=snark_palette[-1])
ax.tick_params(axis='both', labelsize='small', colors=snark_palette[-1], direction='out') # set x/y ticks

# Save and plot
plt.savefig('/kaggle/working/plot.happiness.life_exp_at_birth.png', dpi=150, bbox_inches='tight')
plt.show()

## Median Age And Happiness Score

We would like to see how happiness is distributed according to the age of the population.  
To do this, we will divide the countries by age group based on the median age.

In [None]:
# Calculate min and max median age
median_age_min = cia['median_age'].min()
median_age_max = cia['median_age'].max()

# Calculate bins
bin_step = 5
bin_from = int((median_age_min // bin_step) * bin_step)
bin_to = int((median_age_max // bin_step) * bin_step + bin_step)
bins = [i for i in range(bin_from, bin_to, bin_step)] # age groups

happiness_cia['age_range'] = pd.cut(happiness_cia['median_age'], bins)
happiness_cia_grouped = (happiness_cia[['age_range', 'median_age', 'score']].groupby('age_range')
                                                                            .mean()
                                                                            .reset_index()
                        )
happiness_cia_grouped

In [None]:
# Inscriptions
title = """The Relationship Between Median Age And Happiness"""
description = """
Correlation of the median age with the happiness score by country based on 2018 data.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='whitegrid', palette=snark_palette,
        rc={'xtick.major.size': 4, 'ytick.major.size':4,
            'axes.spines.left': False, 'axes.spines.bottom': False,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
fig = plt.figure(figsize=figsize, facecolor='w')
ax = sns.violinplot(x='age_range', y='score', data=happiness_cia,
                    inner='quart', linewidth=1,
                    palette='spring_r'
                   )

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.03, 0), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.xaxis.set_label_text('') # remove label of x axis
ax.text(s='Median Age', x=3, y=2.2, horizontalalignment='center', verticalalignment='center', size='large', c=snark_palette[-1]) # set label of x axis
ax.set_ylabel('Happiness score', horizontalalignment='center', size='large', c=snark_palette[-1]) # set label of y axis
ax.tick_params(axis='both', labelsize='medium', colors=snark_palette[-1]) # set x/y ticks

# Save and plot
plt.savefig('/kaggle/working/plot.happiness.median_age.png', dpi=150, bbox_inches='tight')
plt.show()

## Correlation Map

We'll plot the correlation matrix below, but first, we'll prepare the data to make the plot easier to read.

In [None]:
# A triangular mask to avoid repeated values
happiness_cia_corr = happiness_cia_corr.iloc[1:, :-1]
mask = np.triu(np.ones_like(happiness_cia_corr), k=1)

# Readable names for the plot
cols_dict = {'score':'Happiness',
             'life_expectancy_at_birth':'Life exp.\nat birth',
             'median_age':'Median\nage',
             'birth':'Birth',
             'population_growth':'Population\ngrowth',
             'death':'Death'
            }
# Rename columns in the correlation matrix
happiness_cia_corr.rename(columns=cols_dict, index=cols_dict, inplace=True)

In [None]:
# Color palette for the data
palette = [snark_palette[0], # red
           'lightgrey',
           snark_palette[1] # green
          ]

# Inscriptions
title = """Relationship Between Age Indicators And The Happiness score"""
description = """
Сorrelation of age indicators with the happiness score by country based on 2018 data.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette=palette,
        rc={'xtick.bottom':False, 'ytick.left':False, 
            'axes.spines.left': False, 'axes.spines.bottom': False,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
fig, ax = plt.subplots(1, 1, figsize=figsize, facecolor='w')
sns.heatmap(happiness_cia_corr, mask=mask, cmap=palette,
            vmin=-1, vmax=1, center=0,
            square=False, linewidths=.5, annot=True, fmt='.2g',
            cbar_kws={'shrink': 1, 'ticks':[], 'label':'-1 negative <- correlation -> positive +1'},
            ax=ax)

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(20, -4), size=6, xycoords='figure points', c=snark_palette[-1])
ax.tick_params(axis='both', colors=snark_palette[-1]) # set x/y ticks
ax.set_yticklabels(ax.get_yticklabels(), rotation=0) # set rotation for y tick labels

# Save and plot
plt.savefig('/kaggle/working/plot.happiness.age.png', dpi=150, bbox_inches='tight')
plt.show()

Not surprisingly, the older the population, the lower the birth rate. It's surprising how strong the correlation is between birth rate and median age.  
The younger population has a higher population growth rate, but it is less happy.

## Median Age And Birth

In [None]:
# Inscriptions
title = """The Relationship Between Median Age And Birth"""
description = """
Correlation of the median age with the birth rate by country based on 2018 data.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette=snark_palette,
        rc={'xtick.major.size': 4, 'ytick.major.size':4,
            'axes.spines.left': False, 'axes.spines.bottom': False,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
fig = plt.figure(figsize=figsize, facecolor='w')
g = sns.jointplot(x='median_age', y='birth', data=happiness_cia,
                  kind='reg', truncate=False, dropna=True,
                  xlim=(10, 50), ylim=(0, 50),
                  marginal_kws=dict(hist=True, bins=10),
                  color=snark_palette[0]
                 )

# Set some aesthetic params for the plot
g.ax_marg_x.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot
g.ax_marg_x.annotate(description, xy=(0.015, -0.01), size=6, xycoords='figure fraction', c=snark_palette[-1])
g.ax_joint.set_xlabel('Median Age', horizontalalignment='center', size='x-large', c=snark_palette[-1]) # set label of x axis
g.ax_joint.set_ylabel('Birth', horizontalalignment='center', size='x-large', c=snark_palette[-1]) # set label of y axis
g.ax_joint.tick_params(axis='both', labelsize='large', colors=snark_palette[-1]) # set x/y ticks
g.ax_joint.spines['bottom'].set_color(snark_palette[-1]) # color x axis
g.ax_joint.spines['left'].set_color(snark_palette[-1]) # color y axis
g.ax_marg_x.tick_params(axis='x', bottom=False) # disable x margin ticks
g.ax_marg_x.spines['bottom'].set_color(snark_palette[0])
g.ax_marg_y.tick_params(axis='y', left=False) # disable y margin ticks
g.ax_marg_y.spines['left'].set_color(snark_palette[0])

# Save and plot
plt.savefig('/kaggle/working/plot.happiness.age.birth.png', dpi=150, bbox_inches='tight')
plt.show()

## Conclusions

[In the previous project](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00002_world_happiness/world_happiness.where.map.ipynb), we have already defined the TOP5 least happy countries. This rating includes Afghanistan and the Central African Republic.  
These countries were also included in the TOP5 countries with the lowest life expectancy at birth.  

Life expectancy at birth describes the overall quality of life in the country and indicates the health of the population. Not surprisingly, the higher the life expectancy at birth, the higher the happiness score.  

Let's take a look at the relationship between median age and happiness.  
It seems that society in its development reaches some point of saturation, when a person, living a longer life, does not become happier.  

Some correlations indicate that the older the population, the less striving (or able?) to self-reproduction:

- the higher median age, the lower population growth;
- the higher median age, the lower birth rate.

Earlier, we found out that thirty three is a special age. You will find research on this topic [here](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00001_thirty_years_old/thirty_years_old.ipynb).  
It is curious that the age of the population over which the happiness score does not change significantly is also about thirty five years old.  
Even more interesting is that the median age of the world's population is thirty years, as we saw above.  
Well, perhaps humanity is in its prime. On average, of course.