# World Happiness Report 2021 Analysis

This notebook is an exploratory data analysis of the 2021 World Happiness Report.\
The dataset can be found at: [https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021]

The interactive graphs in this notebook can be seen using this link: [https://nbviewer.org/github/jishnu3000/portfolio-projects/blob/main/world_happiness_2021_analysis.ipynb]

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import chart_studio.plotly as py
import cufflinks as cf
import plotly.express as px
import plotly.io as pio
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Import Dataset and Initial Exploration

In [2]:
df = pd.read_csv('world-happiness-report-2021.csv')

In [3]:
df.head()

Unnamed: 0,Country name,Regional indicator,Happiness score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


In [4]:
# select the columns to analyse
cols = ['Country name', 'Regional indicator', 'Happiness score', 'Logged GDP per capita', 'Social support', 'Healthy life expectancy', 
        'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']

df2 = df[cols].copy()

In [5]:
df2.rename(columns={
    'Country name': 'country',
    'Regional indicator': 'regional_indicator',
    'Happiness score': 'happiness_score',
    'Logged GDP per capita': 'GDP_per_capita',
    'Social support': 'social_support',
    'Healthy life expectancy': 'healthy_life_expectancy',
    'Freedom to make life choices': 'freedom_to_make_life_choices',
    'Generosity' : 'generosity', 
    'Perceptions of corruption': 'perceptions_of_corruption'
}, inplace=True)

In [6]:
df2.head()

Unnamed: 0,country,regional_indicator,happiness_score,GDP_per_capita,social_support,healthy_life_expectancy,freedom_to_make_life_choices,generosity,perceptions_of_corruption
0,Finland,Western Europe,7.842,10.775,0.954,72.0,0.949,-0.098,0.186
1,Denmark,Western Europe,7.62,10.933,0.954,72.7,0.946,0.03,0.179
2,Switzerland,Western Europe,7.571,11.117,0.942,74.4,0.919,0.025,0.292
3,Iceland,Western Europe,7.554,10.878,0.983,73.0,0.955,0.16,0.673
4,Netherlands,Western Europe,7.464,10.932,0.942,72.4,0.913,0.175,0.338


In [7]:
# check for null values
df2.isnull().sum()

country                         0
regional_indicator              0
happiness_score                 0
GDP_per_capita                  0
social_support                  0
healthy_life_expectancy         0
freedom_to_make_life_choices    0
generosity                      0
perceptions_of_corruption       0
dtype: int64

In [8]:
df2.duplicated().sum()

0

- There appear to be no null values and no duplicates.

In [9]:
df2.columns

Index(['country', 'regional_indicator', 'happiness_score', 'GDP_per_capita',
       'social_support', 'healthy_life_expectancy',
       'freedom_to_make_life_choices', 'generosity',
       'perceptions_of_corruption'],
      dtype='object')

## Exploratory Data Analysis

### Correlation Heatmap

In [10]:
corr = df2.corr(method='pearson', numeric_only=True).round(2)

fig = px.imshow(
    corr, 
    text_auto=True, 
    color_continuous_scale='RdBu',
)
fig.layout.height = 700
fig.layout.width = 900
fig.show()

- We see that the Happiness Score has a strong positive correlation with GDP per capita, social support, life expectancy, and freedome to make life choices.
- The Happiness score also has a weak negative correlation with the perception of corruption.  

### Happiness Score vs GDP Scatter Plot

In [11]:
fig = px.scatter(
    df2,
    x='happiness_score',
    y='GDP_per_capita',
    color='regional_indicator',
    title='Happiness Score vs GDP',
    labels={
        'happiness_score': 'Happiness Score',
        'GDP_per_capita': 'GDP per capita',
        'regional_indicator': 'Region'
    },
    size='GDP_per_capita',
    hover_data=['country'],
)
fig.layout.height = 700
fig.layout.width = 1200
fig.show()

### GDP By Region Pie Chart

In [12]:
gdp_region = df2.groupby('regional_indicator').sum(numeric_only=True)['GDP_per_capita'].reset_index()
gdp_region

Unnamed: 0,regional_indicator,GDP_per_capita
0,Central and Eastern Europe,171.854
1,Commonwealth of Independent States,112.822
2,East Asia,62.206
3,Latin America and Caribbean,187.4
4,Middle East and North Africa,164.324
5,North America and ANZ,43.238
6,South Asia,60.778
7,Southeast Asia,84.793
8,Sub-Saharan Africa,290.707
9,Western Europe,227.277


In [13]:
fig = px.pie(
    gdp_region,
    values='GDP_per_capita',
    names='regional_indicator',
    title='GDP by Region',
    labels={
        'happiness_score': 'Happiness Score',
        'GDP_per_capita': 'Total GDP',
        'regional_indicator': 'Region'
    }
)
fig.layout.height = 600
fig.layout.width = 1100
fig.show()

### Total Countries in each Region

In [14]:
total_countries = df2.groupby('regional_indicator')[['country']].count().sort_values(by='country', ascending=False).reset_index()
total_countries

Unnamed: 0,regional_indicator,country
0,Sub-Saharan Africa,36
1,Western Europe,21
2,Latin America and Caribbean,20
3,Central and Eastern Europe,17
4,Middle East and North Africa,17
5,Commonwealth of Independent States,12
6,Southeast Asia,9
7,South Asia,7
8,East Asia,6
9,North America and ANZ,4


### Perceived Corruption in each Region Bar Plot

In [15]:
corruption = df2.groupby('regional_indicator').mean(numeric_only=True)[['perceptions_of_corruption']].reset_index()
corruption

Unnamed: 0,regional_indicator,perceptions_of_corruption
0,Central and Eastern Europe,0.850529
1,Commonwealth of Independent States,0.725083
2,East Asia,0.683333
3,Latin America and Caribbean,0.7926
4,Middle East and North Africa,0.762235
5,North America and ANZ,0.44925
6,South Asia,0.797429
7,Southeast Asia,0.709111
8,Sub-Saharan Africa,0.765944
9,Western Europe,0.523095


In [16]:
fig = px.bar(
    corruption,
    x='regional_indicator',
    y='perceptions_of_corruption',
    title='Perceived Corruption in each Region',
    labels={
        'happiness_score': 'Happiness Score',
        'regional_indicator': 'Region',
        'perceptions_of_corruption': 'Corruption'
    },
    color='regional_indicator'
)
fig.layout.height = 500
fig.layout.width = 1000
fig.update_layout(xaxis={'categoryorder': 'total descending'})
fig.show()

### Top 10 Happiest Countries Life Expectancy

In [17]:
fig = px.bar(
    df2.head(10),
    x='country',
    y='healthy_life_expectancy',
    title='Top 10 Happiest Countries Life Expectancy',
    labels={
        'happiness_score': 'Happiness Score',
        'country': 'Country',
        'healthy_life_expectancy': 'Life Expectancy',
        'regional_indicator': 'Region'
    },
    color='regional_indicator',
    hover_data=['happiness_score', 'regional_indicator']
)
fig.layout.height = 500
fig.layout.width = 1000
fig.update_layout(xaxis={'categoryorder': 'total descending'})
fig.show()

### Bottom 10 Happiest Countries Life Expectancy

In [18]:
fig = px.bar(
    df2.tail(10),
    x='country',
    y='healthy_life_expectancy',
    title='Bottom 10 Happiest Countries Life Expectancy',
    labels={
        'happiness_score': 'Happiness Score',
        'country': 'Country',
        'healthy_life_expectancy': 'Life Expectancy',
        'regional_indicator': 'Region'
    },
    color='regional_indicator',
    hover_data=['happiness_score', 'regional_indicator']
)
fig.layout.height = 500
fig.layout.width = 1000
fig.update_layout(xaxis={'categoryorder': 'total descending'})
fig.show()

### Happiness Score vs GDP Scatter Plot

In [19]:
fig = px.scatter(
    df2,
    x='freedom_to_make_life_choices',
    y='happiness_score',
    color='regional_indicator',
    title='Happiness Score vs GDP',
    labels={
        'happiness_score': 'Happiness Score',
        'freedom_to_make_life_choices': 'Freedom to make life choices',
        'regional_indicator': 'Region',
        'country': 'Country'
    },
    size='happiness_score',
    hover_data=['country'],
)

fig.layout.height = 700
fig.layout.width = 1200

fig.show()

### Countries with Highest Perception of Corruption Bar Plot

In [20]:
country = df2.sort_values(by='perceptions_of_corruption', ascending=False).head(10)

fig = px.bar(
    df2.tail(10),
    x='country',
    y='perceptions_of_corruption',
    title='Countries with Highest Perception of Corruption',
    labels={
        'happiness_score': 'Happiness Score',
        'country': 'Country',
        'healthy_life_expectancy': 'Life Expectancy',
        'regional_indicator': 'Region',
        'perceptions_of_corruption': 'Corruption'
    },
    color='regional_indicator',
    hover_data=['happiness_score', 'regional_indicator']
)
fig.layout.height = 500
fig.layout.width = 1000
fig.update_layout(xaxis={'categoryorder': 'total descending'})
fig.show()

### Happiness Score vs Corruption Scatter Plot

In [21]:
fig = px.scatter(
    df2,
    x='happiness_score',
    y='perceptions_of_corruption',
    color='regional_indicator',
    title='Happiness Score vs Corruption',
    labels={
        'happiness_score': 'Happiness Score',
        'perceptions_of_corruption': 'Corruption',
        'regional_indicator': 'Region',
        'country': 'Country'
    },
    size='happiness_score',
    hover_data=['country'],
)

fig.layout.height = 700
fig.layout.width = 1200

fig.show()

### Happiness Score vs Social Support Scatter Plot

In [22]:
fig = px.scatter(
    df2,
    x='happiness_score',
    y='social_support',
    color='regional_indicator',
    title='Happiness Score vs Social Support',
    labels={
        'happiness_score': 'Happiness Score',
        'social_support': 'Social Support',
        'regional_indicator': 'Region',
        'country': 'Country'
    },
    size='happiness_score',
    hover_data=['country'],
)

fig.layout.height = 700
fig.layout.width = 1200

fig.show()

---