# In Search Of Happiness: Where Is It?

People all over the world are looking for a recipe for happiness. People try to measure happiness like height or temperature.  
For example, [this report from Kaggle](https://www.kaggle.com/unsdsn/world-happiness?select=2019.csv) ranks 156 countries by their level of happiness on a 10-point scale.


## The World Happiness Report

Quoting Kaggle:

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

You can read more [here](https://www.kaggle.com/unsdsn/world-happiness).


## Introduction

Even in our search of happiness, we've never been interested in anything but the data.  
With The World Happiness Report, we would like to discover the happiest places.

We are most interested in the following columns:

- `Country or region` - country name
- `Overall rank` - country's place in the rating
- `Score` - happiness score

In this project, we'll do the following:

1. Generate a world happiness map.
2. Determine happiness by region and continent. 
3. Overview TOP5 rating of the happiest countries and the least happy countries.

As a result, we want to find out ***where*** *the happiest people live*.


## Reading The Report

In [None]:
# Import libs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd # to plot the map
import seaborn as sns # for other plots

Let's take the latest report, published in 2019.  
We'll rename the columns according to the snake_case format.

In [None]:
# Set the width to display
pd.set_option('display.width', 120)
# Increase the number of rows to display
pd.set_option('display.max_rows', 180) 

# Get the data
happiness = pd.read_csv('/kaggle/input/world-happiness/2019.csv')

# Column map to rename
cols_dict = {'Country':'country',
             'Country or region':'country',
             'Region':'region',
             'Happiness Rank':'rank',
             'Happiness.Rank':'rank',
             'Overall rank':'rank',
             'Happiness Score':'score',
             'Happiness.Score':'score',
             'Score':'score',
             'Economy (GDP per Capita)':'gdp_per_capita',
             'Economy..GDP.per.Capita.':'gdp_per_capita',
             'GDP per capita':'gdp_per_capita',
             'Family':'family',
             'Freedom':'freedom',
             'Freedom to make life choices':'freedom',
             'Generosity':'generosity',
             'Health (Life Expectancy)':'life_expectancy',
             'Health..Life.Expectancy.':'life_expectancy',
             'Healthy life expectancy':'life_expectancy',
             'Perceptions of corruption':'trust_corruption',
             'Trust (Government Corruption)':'trust_corruption',
             'Trust..Government.Corruption.':'trust_corruption',
             'Social support':'social_support',
             'Dystopia Residual':'dystopia_residual',
             'Dystopia.Residual':'dystopia_residual',
             'Standard Error':'standard_error',
             'Upper Confidence Interval':'whisker_high',
             'Whisker.high':'whisker_high',
             'Lower Confidence Interval':'whisker_low',
             'Whisker.low':'whisker_low'
            }

happiness.rename(columns=cols_dict, inplace=True) # rename the columns

print(happiness.columns) # check the new column names

happiness.head() # check the values

In [None]:
happiness.tail(10) # last ten rows

In [None]:
happiness.info()

We see 156 countries in the report. Fortunately, there is no any missing values. All columns except the `country` were successfully casted to a numeric type.

Let's check for duplicates.

In [None]:
# Duplicated
print('Duplicated: {}'.format(happiness.duplicated(subset='country').sum()))

It's all right.
Let's continue with the plots.


## The World Happiness Map

We want to display a map of the world and overview how the happiness scores are distributed on it.  
We'll get the borders of countries from [Natural Earth Data](http://www.naturalearthdata.com/downloads/10m-cultural-vectors/).  

We'll create a GeoDataFrame from the `Admin 0 - Countries` shapefile.  
Here are some notes on the Natural Earth's data:

- Natural Earth uses `UTF-8` character encoding to support internationalization with a full range of language scripts.
- The projection specified in the `PRJ` file is `WGS84`, which is `EPSG:4326`.

We'll only be interested in the following columns:

- `NAME_LONG` - full name of the country
- `NAME` - name of the country
- `GEOUNIT` - label for the territory
- `ADM0_A3` - the 3-letter country codes defined in ISO 3166-1 alpha-3
- `CONTINENT`
- `SUBREGION` - subregion the country belongs to
- `REGION_WB` - region the country belongs to
- `LEVEL` - level of detail
- `geometry` - the country shapes as polygons

In [None]:
file_shape = '/kaggle/input/natural-earth/10m_cultural/10m_cultural/ne_10m_admin_0_countries_lakes.shp' # the shape file

cols = ['NAME_LONG', 'NAME', 'GEOUNIT', 'ADM0_A3', 'CONTINENT', 'SUBREGION', 'REGION_WB', 'LEVEL', 'geometry']

# Read the shapes
countries = gpd.read_file(file_shape)
countries = countries[cols].to_crs('EPSG:4326')

# Convert column names to lowercase
countries.columns = countries.columns.str.lower()

countries.sample(5, random_state=1) # check the values randomly

In [None]:
countries.info()

There are 255 countries, and there are no empty values for them.

According to the [`How-to` section](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-details/), let's check that the borders are at the country level, i.e. the `level` equal to 2.

In [None]:
print('Level of detail: {}'.format(countries['level'].value_counts()))

Now, we can merge geo data with happiness data.  

First, we need to check the columns that will be used for merging.  
Country names may differ in data sets, for instance, `eSwatini` and `Swaziland`, `Trinidad and Tobago` and `Trinidad & Tobago`. In this case, the rows will not match.

Let's compare the `geounit` column and the `country` column of the `happiness` data set.  
To do this, we'll combine two data sets using `outer` join.

In [None]:
cols_check = ['country', 'geounit']
happiness_geo = countries.merge(happiness, left_on='geounit', right_on='country', how='outer')[cols_check]
happiness_geo[happiness_geo.isnull().any(axis=1)].sort_values(by='country') # the countries don't match

To provide the same country names:

- Create a dictionary mapping all names to the values in the `happiness` dataset since we explore the happiness data.
- Store the `geounit` column in a new column `country`.
- Rename the countries by replacing the values in the new `country` column according to the map dictionary.

In [None]:
# Countries map to rename
country_to_rename = {'Republic of the Congo':'Congo (Brazzaville)',
                     'Democratic Republic of the Congo':'Congo (Kinshasa)',
                     'Czechia':'Czech Republic',
                     'Hong Kong S.A.R.':'Hong Kong',
                     'Macedonia':'North Macedonia',
                     'Palestine':'Palestinian Territories',
                     'Republic of Serbia':'Serbia',
                     'eSwatini':'Swaziland',
                     'Trinidad and Tobago':'Trinidad & Tobago',
                     'United States of America':'United States'
                    }

countries['country'] = countries['geounit']
countries['country'].replace(country_to_rename, inplace=True) # rename

countries.sample(5, random_state=3) # check the values randomly

We can now use the new `country` column to merge the two datasets.

In [None]:
happiness_geo = countries.merge(happiness, on='country', how='outer').copy()

# Check for full match
print('Non-matching countries: {}'.format( happiness_geo[cols_check].isnull().any(axis=1).sum() )
     )

In [None]:
happiness_geo.info()

There are 255 countries in total, 156 of them are ranked by their happiness levels.  

Let's create the World Happiness Map.

In [None]:
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline

# Turn on svg rendering
%config InlineBackend.figure_format = 'svg'

# Color palette for the blog
snark_palette = ['#e0675a', # red
                 '#5ca0af', # green
                 '#edde7e', # yellow
                 '#211c47' # dark blue
                ]

In [None]:
# ***With GeoPandas***

# Color palette for the data
palette = 'BrBG'

# Inscriptions
title = 'World Happiness in 2019'

description = """
Countries with happiness scores on a 10-point scale based on answers to the main life evaluation question asked in the poll.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness | Author: @data.sugar
"""

# Plot size
figsize = (8, 6)

# Create the plot
fig, ax = plt.subplots(1, 1, facecolor='w')
divider = make_axes_locatable(ax) # add colorbar
cax = divider.append_axes('right', size="2%", pad=0.1) # set colorbar

ax = happiness_geo.plot(column='score', figsize=figsize,
                        cmap=palette, legend=True,
                        missing_kwds={'color': 'lightgrey', # for non-rated countries
                                      'label':'missing values'
                                     },
                        alpha=0.9, facecolor='white',
                        ax=ax, cax=cax
                       )

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 14}, loc='left', c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.03, 0.06), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.set_axis_off() # hide axes
ax.set_xlim([-170, 180])
ax.set_ylim([-65, 85])
cax.tick_params(colors=snark_palette[-1]) # color x ticks

# Save and plot
fig.subplots_adjust(bottom=0.025, top=0.88, left=0.025, right=0.9) # adjust for the post picture
plt.savefig('/kaggle/working/plot.happiness.map.png', dpi=150, bbox_inches='tight')
plt.show()

## Happiness by Continent and Region

It should be noted that we have only 156 ranked countries, not all countries in the world.  
With this in mind, let's continue with creating a happiness rating by continent/region.

We'll calculate the weighted average happiness score using the formula:  
    for each continent/region - $$\overline{score}_{continent/region} = {\displaystyle\sum_{i=1}^{n} score_{i} * population_{i} \over \displaystyle\sum_{i=1}^{n} population_{i}},$$ where `n` - the number of ranked countries per continent/region.

The formula takes into account the population of each country in the region.

### Getting population by country

We'll take the population by country from [World Population Review](https://worldpopulationreview.com/#liveWorldPop).  
We have uploaded the data to the `population.csv` file.  
As we explore the happiness report for 2019, we should get the population for 2019.

Let's read the file and examine the data.

In [None]:
pop = pd.read_csv('/kaggle/input/population-2019/population.csv')[['name', 'pop2019']]
pop['pop2019'] = pop['pop2019'] * 1000 # the population is given in thousands
pop.head()

We'll select only the ranked countries and then combine them with the population data.

In [None]:
happiness_geo = happiness_geo[happiness_geo['score'].notnull()]

First, we need to check the key columns before merging.  
We'll compare the `country` column and the `name` column of the `pop` data set.

In [None]:
cols_check = ['geounit', 'country', 'name_pop']
happiness_geo_pop = happiness_geo.merge(pop, how='outer',
                                        left_on='country', right_on='name',
                                        suffixes=(None, '_pop')
                                       )[cols_check]
happiness_geo_pop[happiness_geo_pop.isnull().any(axis=1)] # the countries don't match

Let's bring these names to the same values:

As in the previous case, we'll do the following:

- Create a dictionary mapping all names to the values in the `happiness` dataset since we explore the happiness data.
- Store the `name` column in a new column `country`.
- Rename the countries by replacing the values in the new `country` column according to the map dictionary.

In [None]:
# Countries map to rename
country_to_rename = {'Palestine':'Palestinian Territories',
                     'Republic of the Congo':'Congo (Brazzaville)',
                     'DR Congo':'Congo (Kinshasa)',
                     'Macedonia':'North Macedonia',
                     'Trinidad and Tobago':'Trinidad & Tobago'
                    }

pop['country'] = pop['name']
pop['country'].replace(country_to_rename, inplace=True) # rename

pop.sample(5, random_state=3) # check the values randomly

Let's combine the happiness data with the population.

In [None]:
happiness_geo_pop = happiness_geo.merge(pop, how='left', on='country', suffixes=(None, '_pop'))
happiness_geo_pop[happiness_geo_pop.isnull().any(axis=1)] # Check for full match

`Kosovo` and `Northern Cyprus` are disputed area and are not presented in the World Population Review report.  
We should also clarify the population for related countries: Serbia and Cyprus, respectively.
So, we'll fill in the popualtion for these areas from the following resources:

- [Wikipedia Northern Cyprus](https://en.wikipedia.org/wiki/Northern_Cyprus)
- [Wikipedia Demographics of Cyprus](https://en.wikipedia.org/wiki/Demographics_of_Cyprus#Population)
- [Wikipedia Demographics of Serbia](https://en.wikipedia.org/wiki/Demographics_of_Serbia)
- [Wikipedia Demographics of Kosovo](https://en.wikipedia.org/wiki/Demographics_of_Kosovo) 

In [None]:
pop2019_update = {'Kosovo':1788891,
                  'Northern Cyprus':326000,
                  'Cyprus':888000,
                  'Serbia':6945235
                 }

for cntr in pop2019_update:
    happiness_geo_pop.loc[happiness_geo_pop['country'] == cntr, 'pop2019'] = pop2019_update[cntr]
    happiness_geo_pop.loc[happiness_geo_pop['country'] == cntr, 'name_pop'] = cntr

happiness_geo_pop[happiness_geo_pop['country'].isin(pop2019_update.keys())]

Below, we'll check the prepared data.

In [None]:
happiness_geo_pop.info()

We have 156 countries with non-empty variables.  
Let's compute the weighted mean happiness scores by continent and region.

### Calculating the weighted average happiness score

In [None]:
# Define the weighted mean function according to the above formula
mean_weighted = lambda x: np.average(x, weights=happiness_geo_pop.loc[x.index, 'pop2019'])

By continent:

In [None]:
happiness_pv_cont = happiness_geo_pop.pivot_table(values='score', index='continent',
                                                  aggfunc=mean_weighted, margins=True
                                                 )
happiness_pv_cont = happiness_pv_cont.reset_index().sort_values(by='score', ascending=False)
happiness_pv_cont

By region:

In [None]:
happiness_pv_reg = happiness_geo_pop.pivot_table(values='score', index='region_wb',
                                                 aggfunc=mean_weighted, margins=True
                                                )
happiness_pv_reg = happiness_pv_reg.reset_index().sort_values(by='score', ascending=False)
happiness_pv_reg

Next, we'll plot the ratings.

## Happiness Rating by Continent

In [None]:
%matplotlib inline

# Turn on svg rendering
%config InlineBackend.figure_format = 'svg'

# Color palette for the blog
snark_palette = ['#e0675a', # red
                 '#5ca0af', # green
                 '#edde7e', # yellow
                 '#211c47' # dark blue
                ]

In [None]:
# Color palette for the data
palette = 'BrBG_r'

# Inscriptions
title = """World Happiness by Continent in 2019"""
description = """
Continents with weighted average happiness scores on a 10-point scale based on country scores.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette='BrBG',
        rc={'xtick.major.size': 4, 'ytick.left':False,
            'axes.spines.left': False, 'axes.spines.bottom': True,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
f, ax = plt.subplots(1, 1, figsize=figsize, facecolor='w')
sns.barplot(x='score', y='continent',
            data=happiness_pv_cont[happiness_pv_cont['continent'] != 'All'],
            orient='h', palette=palette, ax=ax
           )
ax.axvline(x=happiness_pv_cont.loc[(happiness_pv_cont['continent'] == 'All'), 'score'].item(),
           ymin=0, ymax=0.99,
           marker='_', linestyle='--',
           color=snark_palette[-1], label='World'
          ) # World score

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='left', pad=0, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.305, 0.004), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.text(s='World', x=5.25, y=6.2, c=snark_palette[-1]) # set label for the World score
ax.set_xlabel('Score', x=1.02, ha='right', c=snark_palette[-1]) # set label of x axis
ax.set_ylabel('') # set label of x axis
ax.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks
ax.tick_params(axis='y', labelsize=12, colors=snark_palette[-1]) # set y ticks
ax.spines['bottom'].set_color(snark_palette[-1]) # color x axis

# Save and plot
fig.subplots_adjust(bottom=0.025, top=0.88, left=0.025, right=0.9) # adjust for the post picture
plt.savefig('/kaggle/working/plot.happiness.continent.png', dpi=150, bbox_inches='tight')
plt.show()

## Happiness Rating by Region

In [None]:
%matplotlib inline

# Turn on svg rendering
%config InlineBackend.figure_format = 'svg'

# Color palette for the blog
snark_palette = ['#e0675a', # red
                 '#5ca0af', # green
                 '#edde7e', # yellow
                 '#211c47' # dark blue
                ]

In [None]:
# Color palette for the data
palette = 'BrBG_r'

# Inscriptions
title = """World Happiness by Region in 2019"""
description = """
Regions with weighted average happiness scores on a 10-point scale based on country scores.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette='BrBG',
        rc={'xtick.major.size': 4, 'ytick.left':False,
            'axes.spines.left': False, 'axes.spines.bottom': True,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
f, ax = plt.subplots(1, 1, figsize=figsize, facecolor='w')
sns.barplot(x='score', y='region_wb',
            data=happiness_pv_reg[happiness_pv_reg['region_wb'] != 'All'],
            orient='h', palette=palette, ax=ax
           )
ax.axvline(x=happiness_pv_reg.loc[(happiness_pv_reg['region_wb'] == 'All'), 'score'].item(),
           ymin=0, ymax=0.99,
           marker='_', linestyle='--',
           color=snark_palette[-1], label='World'
          ) # World score

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='left', pad=0, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.315, 0.004), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.text(s='World', x=5.25, y=6.2, c=snark_palette[-1]) # set label for the World score
ax.set_xlabel('Score', x=1.02, ha='right', c=snark_palette[-1]) # set label of x axis
ax.set_ylabel('') # set label of x axis
ax.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks
ax.tick_params(axis='y', labelsize=12, colors=snark_palette[-1]) # set y ticks
ax.spines['bottom'].set_color(snark_palette[-1]) # color x axis

# Save and plot
fig.subplots_adjust(bottom=0.025, top=0.88, left=0.025, right=0.9) # adjust for the post picture
plt.savefig('/kaggle/working/plot.happiness.region.png', dpi=150, bbox_inches='tight')
plt.show()

## TOP5 Countries

Let's show the five countries with the highest happiness scores and the five with the lowest happiness scores.

In [None]:
happiness_top5 = (happiness_geo_pop[['country', 'rank', 'score']] # get only the required columns
                      .sort_values(by='score', ascending=False) # get the first 5 rows with the highest score 
                                                                #     and the last 5 rows with the lowest score
                      .iloc[[*range(5), *range(-5, 0)]] # select only the first and last 5 rows
                 )
happiness_top5

In [None]:
# Color palette for the data
palette = 'BrBG_r'

# Inscriptions
title = """Top5 Happiest and Least Happy Countries"""
description = """Countries with the highest and lowest happiness scores on a 10-point scale.
Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness | Author: @data.sugar
"""

# Plot size
figsize = (6,4)

# Set the figure
sns.set(context='paper', style='ticks', palette='BrBG',
        rc={'xtick.major.size': 4, 'ytick.left':False,
            'axes.spines.left': False, 'axes.spines.bottom': True,
            'axes.spines.right': False, 'axes.spines.top': False
           }
       )

# Create the plot
f, ax = plt.subplots(1, 1, figsize=figsize, facecolor='w')
sns.barplot(x='score', y='country', data=happiness_top5,
            orient='h', palette=palette, ax=ax
           )
ax.axvline(x=happiness_pv_reg.loc[(happiness_pv_reg['region_wb'] == 'All'), 'score'].item(),
           ymin=0, ymax=0.996,
           marker='_', linestyle='--',
           color=snark_palette[-1], label='World'
          ) # World score

# Set some aesthetic params for the plot
ax.set_title(title, fontdict={'fontsize': 16}, loc='left', pad=0, c=snark_palette[-1]) # set a title of the plot
ax.annotate(description, xy=(0.3, 0.004), size=6, xycoords='figure fraction', c=snark_palette[-1])
ax.text(s='World', x=5.25, y=9.2, c=snark_palette[-1]) # set label for the World score
ax.set_xlabel('Score', x=1.02, ha='right', c=snark_palette[-1]) # set label of x axis
ax.set_ylabel('') # set label of x axis
ax.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks
ax.tick_params(axis='y', labelsize=12, colors=snark_palette[-1]) # set y ticks
ax.spines['bottom'].set_color(snark_palette[-1]) # color x axis

# Save and plot
fig.subplots_adjust(bottom=0.025, top=0.88, left=0.025, right=0.9) # adjust for the post picture
plt.savefig('/kaggle/working/plot.happiness.top5.png', dpi=150, bbox_inches='tight')
plt.show()

## Conclusions

Here are our thoughts on the World Happiness Map.  

Let's distinguish three groups of cultures: [Western culture, Eastern culture](https://en.wikipedia.org/wiki/East%E2%80%93West_dichotomy) and others.  

On the map, the happiest areas are those where Western culture has the greatest influence on people's minds.  

There are many different cultures in Africa. Eastern culture, for example, covers India and China. Russia or the countries of South America belong to mixed cultures, where, however, the influence of Western values is felt.  

It seems that the countries where Western culture is widespread are the happiest.  

However, we would say that the method of evaluating happiness is most likely not suitable for evaluating the happiness of other cultures, and the map reflects more the spread of Western culture in the world than happiness.  

Notwithstanding, we are surprised that the least happy region is South Asia.